Understanding physical processes and making environmental predictions using Deep Learning

BIDS Senior Fellow Lauel Larsen and ESDL Project Scientist Dino Bellugi are offering this project (4) through UC Berkeley's Undergraduate Research Apprentice Program (URAP) for the Spring 2021 academic semester. Eligible undergraduates may apply online January 12-25, 2021.

The Environmental Systems Dynamics Laboratory (ESDL) focuses on the interplay between biological, physical, and human aspects of the environment using a combination of physically-based and data-driven models. Research topics include how river deltas grow or shrink, how landslides move, how deforestation affects precipitation, and how to forecast the response of environmental systems under changing forcing scenarios.

This internship aims to expand on our current work exploring the use of deep learning (DL) for environmental predictions. DL methods often outperform other models (including physical ones) in making environmental predictions but are often used as a “black box”, reducing our ability to gain insight into the physical processes involved. For example, Long-Short-Term-Memory (LSTM) networks are extremely effective in making river discharge predictions, even in watersheds that are snow-dominated, as they can capture the lags between the forcing and response variables. Unlike a physical model, the LSTM does not know that in the winter precipitation turns to snow and does not become discharge until the melting season. Yet, it learns from data that the system has a memory, and is able to generate accurate discharge predictions, based on precipitation and temperature time series. Analyzing the LSTM state variables could provide insight on how the response may change under different climatic regimes.

Similar applications include the prediction of subsurface pore water pressure development as a function of precipitation for the prediction of landslide displacement. A major challenge in evaluating landslide hazard is predicting the evolution of shallow pore-water pressure in the ground over time. This is due to both lack of sufficient monitoring and to the difficulty of modeling infiltration of rainfall into unsaturated ground, which is strongly dependent on the soil moisture content. While an LSTM is able to make good predictions, we would like to learn how these response times change across time and space. We also want to explore how transferable DL methods are across different landscape or climate gradients, as transferability is essential in developing larger scale models that are able to, for example, forecast the activation of landslides across California.

Students will work with a variety of time series data from intensely monitored Critical Zone observatories, as well as from state and national datasets discharge and precipitation. The student will work collaboratively to develop DL models and to interpret the LSTM state variables, and their relative importance. Example tasks involved in this project:

  • Clean, aggregate, and format time series data
  • Experiment with diverse DL architectures
  • Train, calibrate and optimize DL models
  • Analyze relative importance of predictor variables
  • Analyze LSTM state variables to infer memory length and threshold behaviors

Learning outcomes include:

  • Achieving an improved understanding of environmental systems and hydrology in particular
  • Improving data processing skills, including time series analyses
  • Becoming familiar with major issues in hazard forecasting and the underlying science
  • Gaining hands-on experience with data-driven approaches to hillslope geomorphology and catchment hydrology

Qualifications: This project will be of interest to students in Computer Science, Data Science, and Statistics (though students from other majors are also welcome to apply) who have an interest in applying their Machine Learning (ML) experience to the domains of Earth and Planetary Sciences, Geography, Civil and Environmental Engineering. 

Required: Students should have programming capabilities in Python and/or Matlab, with applications in ML. Familiarity with ML libraries such as Scikit-Learn, Keras, and the Matlab Statistics and Machine Learning Toolbox is desirable. Students should demonstrate a strong ML background, highlighting courses they have taken, and applications developed. Students should be willing to work as a member of a research team and have strong communications skills.

BIDS Affiliates

Laurel Larsen

Geography, UC Berkeley
Faculty Affiliate