Data-driven induction of critical timescales to improve the development of predictive models in the earth sciences

Berkeley Distinguished Lectures in Data Science


November 21, 2017
4:10pm to 5:00pm
190 Doe Library
Get Directions

Development and improvement of predictive models in the earth sciences will benefit water resource and natural hazards planning, as well as ecosystem management. However, the community has long struggled with the challenge of prediction because of the perceived complexity of earth systems, with their many observed and unobserved variables, the panoply of timescales that may play a role in the process of interest, and the professional fear of being wrong. Existing predictive models of earth systems tend to fall at either of two extremes of a “pendulum,” each of which has characteristics not ideal for prediction. At one extreme are physically grounded models that often involve many uncertain calibration parameters. These models do well in their representation of small-scale phenomena but perform more poorly at the landscape scale, particularly when integrating over heterogeneous surfaces and multiple storage timescales. At the other extreme are purely data-driven models that represent processes statistically but are not transferable to other locations and may not be suitable in a nonstationary climate. I use the example of discharge prediction in Dry Creek, ID to demonstrate how emerging data-driven techniques can aid in the development of iterative forecasts that converge upon the modeling pendulum’s middle ground. Starting from the empirical extreme, support vector regression, a machine-learning analysis, shows that, by aggregating precipitation data over a range of timescales, reasonable discharge prediction accuracy can be achieved. However, this approach is computationally costly and poorly generalizable. Alternatively, reasonable prediction skill is also attainable from variables indicative of storage (temperature and soil moisture) that are not aggregated in time. The most flexible and informative results, however, are derived from using information flow analyses to identify dynamic critical timescales over which precipitation interacts with soil moisture and soil moisture interacts with discharge, followed by LASSO and sparse symbolic regression to identify the nature and strength of those interactions. Results are potentially generalizable as a physical model of the watershed as a nonlinear filter, with multiple storage zones and resistors, that should perform well for hydrologic prediction.

The Berkeley Distinguished Lectures in Data Science, co-hosted by the Berkeley Institute for Data Science (BIDS) and the Berkeley Division of Data Sciences, features faculty doing visionary research that illustrates the character of the ongoing data, computational, inferential revolution.  In this inaugural Fall 2017 "local edition," we bring forward Berkeley faculty working in these areas as part of enriching the active connections among colleagues campus-wide.  All campus community members are welcome and encouraged to attend.  Arrive at 3:30pm for tea, coffee, and discussion.


Laurel Larsen

Associate Professor, Geography, UC Berkeley

Laurel is an assistant professor of earth systems science at the University of California, Berkeley, where she runs the Environmental Systems Dynamics Laboratory. Previously, she was a research ecologist and research hydrologist with the USGS in Reston, VA. Laurel’s research uses a variety of tools to identify the feedback processes driving environmental systems at the landscape scale. These tools include field and laboratory work, simulation modeling, and data-driven analysis using increasingly available environmental data from sensor networks and remote sensing platforms. Much of this work focuses on how water interacts with physical (e.g., sediment) and biological (e.g., plants) components of the environment, often in nonlinear ways that lead to thresholds, sudden shifts between alternate stable states, or chaotic behavior. Understanding these type of interactions enables anticipatory planning and improves the efficiency and effectiveness of restoration efforts. Her work has influenced restoration efforts in in the Everglades, with ongoing work focusing on the Chesapeake Bay and the Wax Lake Delta, part of the greater Mississippi River delta complex. Laurel earned her PhD from the University of Colorado at Boulder and also trained at Washington University in St. Louis.