12th National Monitoring Conference
Integration of Diverse, Regional-Scale Water Data for Water Quality Analysis and Predictions
Speaker: Charuleka Varadharajan, Research Scientist, Lawrence Berkeley National Laboratory
Abstract: There has been an explosion in Earth’s water observations due to the growth of sensor-based monitoring networks, remote sensing, integrated field observatories, and other measurement approaches. Yet, our ability to utilize ‘water big data’ for scientific analysis, modeling, and decision support is limited by an inability to integrate complex, heterogeneous datasets in a timely, reproducible manner. In particular water quality data tend to be diverse consisting of a variety of physical, chemical and biological measurements, which are spread across many data sources, making it challenging to harmonize large datasets at watershed to regional to continental scales.
Here, we present data-driven approaches to integrate and analyze water data at multiple spatial and temporal scales for water quality predictions. First, we describe BASIN-3D (Broker for Assimilation, Synthesis and Integration of eNvironmental Diverse, Distributed Datasets), a data integration software designed to dynamically retrieve and transform heterogeneous data from different sources into a common format to provide an integrated view. The use of BASIN-3D enables downstream data analysis and modeling using a standardized approach for data retrieval, without requiring individual users to harmonize data customizations for the data type or source. We demonstrate the application of BASIN-3D at the two DOE projects that utilize vastly diverse datasets from multiple data sources to investigate the impacts of hydrologic perturbations (e.g., floods, droughts) at watershed to regional and national scales. We then describe how water quality predictions using machine learning models and other statistical analyses are enabled by the integration of these highly diverse data, and how tools such as BASIN-3D can lower the barrier for scientific data exploration and analysis.
Authors: Charuleka Varadharajan, Deb Agarwal, Madison Burrus, Danielle C. Christianson, Boris Faybishenko, Valerie C. Hendrix, Susan Hubbard, and Helen Weierbach — Lawrence Berkeley National Laboratory, Berkeley, California
Charu is a scientist in the Energy Geosciences Division of the Earth and Environmental Sciences Area at Berkeley Lab. As a biogeochemist, she is interested in studying the nexus of carbon, water, and energy with a focus on understanding and limiting the impacts of human activities on water quality and climate. Her research involves the monitoring and mitigation of contaminants in water resources; the measurement and prediction of carbon fluxes in terrestrial and subsurface environments; and the management, synthesis, and analysis of diverse multi-scale environmental datasets. Her expertise spans various techniques for data collection and analysis, including laboratory experiments; x-ray synchrotron spectroscopy; sensor-based field data collection; web-based tools to integrate distributed datasets in real-time; and the use of geoinformatics, statistical, and wavelet-based data processing to analyze high spatial and temporal resolution data. She is currently interested in applying a combination of statistical, data mining, and machine learning approaches to groundwater and related datasets in California to gain insights that can help the state manage its groundwater sustainably. She had previously participated in data-driven scientific assessments of well stimulation (hydraulic fracturing) in California performed for federal and state agencies and was part of an expert committee advising the state of California on criteria for monitoring groundwater impacted by well stimulation. Charu earned her PhD from the Massachusetts Institute of Technology and conducted her postdoctoral research at Berkeley Lab.
Deb Agarwal is the head of the Data Science and Technology Department in the Computational Research Division at Lawrence Berkeley National Lab. Deb’s research focuses on scientific tools that enable the sharing of scientific experiments, advanced networking infrastructure to support the sharing of scientific data, data analysis support infrastructure for eco-science, and cybersecurity infrastructure to secure collaborative environments.