BIDS announces 2018 Data Science Research Projects

September 12, 2018

BIDS is pleased to announce the three award recipients of our 2018 Call for Data Science Research Projects.

As a world-class research institute at the leading edge of interdisciplinary data science and data-intensive research at the University of California, Berkeley, BIDS hosts many of the world’s experts in statistics, machine learning, research software, and visualization, as they apply across a wide array of academic disciplines. As many fields of research become more data-driven, the use of software and the appropriate management, analysis and maintenance of data becomes an increasingly cross-disciplinary effort.

These three projects address significant research questions, not only of importance in the Principal Investigators’ respective fields, but across data-driven research more broadly. Each project is expected to introduce novel data science techniques and methods, and they offer potential for resulting in open software tools. The lead PIs are all BIDS Senior Fellow, who are seeking research outcomes with global impact for the greater data science community.

Laurel Larsen leads the project, Hydrological forecasting and the water/energy nexus. Clean, available water is a critical and often-overlooked resource for human activity, food production, biological diversity and all aspects of life on our planet. Our current ability to predict where water will be, which rivers will be full or empty, which reservoirs will supply our cities, or how humanity’s water needs will be supplied are woefully inadequate, and increasingly important as our climate changes. Drawing on data streams from a national network of research collaborators and water sensor networks, Larsen’s team will create and consolidate databases, detailed data-driven models, and publicly-available software systems to understand and predict the availability and location of water from the complex, real-world catchments and basins in which it collects and drains naturally, and to forecast where and when water will be available. The software will apply tools from machine learning and information theory and support robust data analysis across many areas of environmental science. BIDS Data Science Fellow Zexuan Xu and Senior Fellow Fernando Perez will also participate in this research collaboration.

Carl Boettiger leads the project, Detecting change in global biodiversity through large scale network analysis. Our global ecosystems that support life are under dramatic pressure and rapid, unprecedented changes. Currently, there are a variety of separate scientific models, each with separate data sources, that can model the activities of diverse parts in an ecosystem, such as soil, bacteria-root interactions, plant density and locations, atmospheric and water composition and locations, temperatures, and many more. Each of these separate aspects in ecological modeling draw from a variety of resources and metrics for evaluating the stability of species and communities in their environments over time. Boettiger’s team is building more accurate systems for analyzing complex interactions that affect stability, resilience and recovery - and building mechanisms that can integrate and combine previously disparate ecological models. They plan to apply ecological network theory and big data ecology to understand changes in whole-ecosystem communities. BIDS Data Science Fellow Ciera Martinez and Senior Fellow Rosemary Gillespie and UC Berkeley Professor of Integrative Biology Rasmus Nielsen will also participate in this research collaboration.

Joshua Bloom leads the project Learning fundamental properties of physical systems with machine learning.  The project seeks to develop data-driven methods and statistical modeling to uncover, unguided by existing theory, fundamental properties from observed physical systems. Can, for instance, the degrees of freedom of a dynamic system be inferred from observation alone? Do data-driven models provide predictive power outside the physical regimes in which they are trained? The team aims to take the first steps in using software and data to provide a new pathway to develop a theoretical understanding of the physical world. Data for this project spans a diverse set of disciplines including materials science and astrophysics. This research represents an early step in a potentially massive shift in how academic research teams can use collected data: not just to test or reject scientific models that already exist, or the models we’ve used to collect the data, but to use machine learning and inference to directly form and create new scientific models that define our scientific understanding. The field of data science in the very early stages of that potential shift, and this project explores the possibility of that future. BIDS Senior Fellow Fernando Pérez will also participate in this research project, and this team also plans to hold an international, community-wide workshop at BIDS in 2019.

The 2018 BIDS Data Science Research Projects set the stage for upcoming proposal calls.  It is our intent to engage a broader range of data-intensive researchers campus-wide over time. 



Featured Fellows

Laurel Larsen

Geography

Carl Boettiger

Environmental Science, Policy, & Management

Joshua Bloom

Astronomy; Center for Time-Domain Informatics
Co-I for Moore/Sloan Data Science Environments

Zexuan Xu

Climate and Ecosystem Science Division, LBNL

Fernando Perez

Statistics
Co-I for Moore/Sloan Data Science Environments

Ciera Martinez

Molecular and Cell Biology

Rosemary Gillespie

Environmental Science, Policy, and Management; Berkeley Natural History Museums