Berkeley defining the next academic frontier: Two NSF awards to UC Berkeley set the stage for a new era of Data Science

October 17, 2017

The Berkeley Institute for Data Science (BIDS) and its partners are pleased to announce two exciting new Data Science efforts receiving simultaneous awards from the National Science Foundation (NSF). Reflecting the breadth and depth of data science at UC Berkeley, the first award will deepen the theoretical foundations of data science in a new transdisciplinary institute, while the second will strengthen educational strategies through national workshops led by the faculty and staff who have guided Berkeley’s broad-ranging data science curriculum.

The awards come as UC Berkeley moves forward in its national leadership, cemented by the growing impact of the Berkeley Institute for Data Science and the vision of integrating data science collaborations across the university with a newly created Division of Data Sciences. At NSF, the awards help signal the launch of an important new portfolio dedicated to Harnessing the Data Revolution (HDR), one of NSF’s “10 Big Ideas for Future NSF Investments.” Aligning with NSF’s new Growing Convergent Research portfolio, the awards to Berkeley focus on the deep integration of disciplines to advance discovery and innovation. "NSF has supported cross-disciplinary collaboration for decades," said NSF Director France Córdova. "Convergence is a deeper, more intentional approach to the integration of knowledge, techniques, and expertise from multiple disciplines in order to address the most compelling scientific and societal challenges."

Berkeley’s first award will support the creation of a new Foundations of Data Analysis (FODA) Institute, which will bring together core research communities in theoretical statistics, applied mathematics, and theoretical computer science. It will also be supported by NSF’s TRIPODS Program (Transdisciplinary Research in Principles of Data Science), which was launched to address fundamental open questions in the theoretical underpinnings of data science. “Data is accelerating the pace of scientific discovery and innovation,” said Jim Kurose, NSF Assistant Director for Computer and Information Science and Engineering (CISE). “These new TRIPODS projects will help build the theoretical foundations of data science that will enable continued data-driven discovery and breakthroughs across all fields of science and engineering.”

The FODA Institute will initially address four deep theoretical challenges: the possibility of a general complexity theory of inference in the context of optimization, the power of stability as a computational-inferential principle, the value of randomness as a statistical and algorithmic resource in data-driven computational mathematics, and the principled combination of science-based with data-driven models. These foundational problems straddle existing cultures of disciplinary research. As Principal Investigator Michael Mahoney explains, “each of these challenges is situated squarely at the interface of theoretical computer science, theoretical statistics, and applied mathematics, and the project will attempt to bridge the underlying interdisciplinary gaps to address some of the most important questions at the heart of data science today.”

Mahoney, an applied mathematician, holds affiliations with the Department of Statistics and the new RISELab in the Department of Electrical Engineering and Computer Sciences (EECS). His expertise lies in the algorithmic and statistical aspects of modern large-scale data analysis and machine learning. Mahoney will be joined by fellow UC Berkeley faculty in the Departments of Mathematics, Statistics, and EECS who are also co-PI’s on the project: Richard M. Karp, Bin Yu, Fernando Perez, and Michael I. Jordan. Jordan, a distinguished professor who holds a joint appointment in the Departments of EECS and Statistics, emphasizes that this foundational approach will “draw on Berkeley’s core strengths to begin to address the many unsolved theoretical problems at the interface of computation and inference and to lay the foundations for the decades of work ahead.”

UC Berkeley’s plan for the FODA Institute builds on its track record of accomplishments in the foundations of data science. These include advanced graduate and postdoctoral training and the cross-cutting programs of the Simons Institute for the Theory of Computing. As Interim Dean of UC Berkeley’s Division of Data Sciences, David Culler remarks, “This three-year award from NSF recognizes Berkeley’s capacity to excel in this transdisciplinary area, its promise of path-breaking results, and its ability to set directions for the research community at large.” Phase I of the TRIPODS Program amounts to a $17.7 million investment by NSF in a dozen new centers across the U.S. A subset of the TRIPODS Phase I projects will be selected to receive funding for larger institutes through a Phase II competitive process.

UC Berkeley’s second NSF award recognizes the university’s leadership in the field of data science education. Berkeley’s strongly integrative undergraduate curriculum in data science has been built from the freshman level upward and is now operating at a scale of more than two thousand students per semester. Cathryn Carson, a Professor of the History of Science and the Faculty Lead of Berkeley’s new Data Science Education Program, is the Principal Investigator on this new NSF award. Carson and her colleagues will lead two national workshops devoted to developing curricular materials firmly anchored in the actual practice of data science work. The effort will be centered on integrating insights from social scientific and educational research into teaching and learning in data science. Carson explains that an important experiential aspect of these workshops “will engage participants in forming a collaborative community with practicing data scientists, educators and social scientists.” The goal of the workshops will be to construct implementable curricular materials with practical resources (such as exercises, course modules, etc.) that will be made publicly available.

In addition to its integration into the Division of Data Sciences, the new data science curriculum award draws upon diverse centers of excellence at Berkeley, including the Center for Science, Technology, Medicine, and Society (CSTMS), the D-Lab for data-intensive social science research, and the Berkeley Institute for Data Science (BIDS). Carson’s co-investigators on the award include Interim Dean Culler, who is a faculty member in Electrical Engineering and Computer Sciences, and BIDS Director Saul Perlmutter, a professor of Physics at UC Berkeley, a senior scientist at Lawrence Berkeley National Laboratory, and a Nobel Prize Winner in Physics. At Berkeley, faculty across the new field of Data Science are actively addressing and expanding convergent programs of education and research that encompass a wider variety of approaches to inquiry and understanding. Culler and Carson are leading a university-wide effort to extend the curriculum of the Division of Data Sciences, and Carson believes that this growing demand for deepened expertise in data science foundations, methodologies, and applications will define a new generation of data scientists. New undergraduate and graduate course offerings are already under way, and a Data Sciences major and minor are planned for the near future. Says Culler, “The reasons for introducing this major are manifold, and reflect Data Science’s increasingly widespread and multidimensional roles in research, industry, and society.” National attention is already growing, as indicated by Executive Vice Chancellor and Provost Paul Alivisatos’ recent presentation on Berkeley’s ambitions for breadth and diversity in data science education to a Research and Technology Subcommittee Hearing on STEM and Computer Science Education before the U.S. House Committee on Science, Space, and Technology.

BIDS Director Perlmutter also sees profound opportunities for integration. “While some data science methodologies have already been heavily integrated into a variety of disciplines - e.g. physics, statistics, mathematics and engineering – for reasons of necessity, their implications for a much broader range of questions and applications in many fields of modern data- and computation-intensive inquiry are now being more fully recognized," he says. Perlmutter has directed BIDS since its founding in 2013, establishing it as the central Berkeley research institute bringing together an active community of world-class researchers who are leading the data science revolution within their respective disciplines.

Perlmutter welcomes the opportunity to extend BIDS into the space of the new Foundations of Data Analysis (FODA) Institute and into deeper collaboration with the Data Science Education Program. He notes an important shift – away from the data itself and toward a stronger focus on the common questions that motivate and challenge researchers in different disciplines where data plays an important role. “The questions themselves have led us to use more data, and the data can change the way we ask the questions. In fact, there are questions we wouldn’t even have asked without the data or the tools. Approaching questions with a wider arsenal of methodologies and perspectives allows for a wider range of questions, rather than the data itself driving the questions.”

As Mahoney explains, “the current campus reorganization will enable optimal leveraging of NSF funding to support a unified approach to foundational studies that will be woven directly into the fabric of interdisciplinary scientific and educational innovation, from cutting-edge research to undergraduate teaching.” Culler has already begun efforts to gather strong faculty into a vibrant and cohesive unit, and to address new challenges as they develop the foundations for successful research and educational initiatives. Remarking on the catalytic possibilities for research across the university and the excitement around Berkeley’s new NSF grants, he sees significant opportunities ahead. “While it is already an integrated and essential dimension of many existing fields of study, Data Science is emerging as a field in its own right,” says Culler. “New applications in a diverse range of disciplines will augment data science foundations as modern research becomes more data-intensive and data-rich.”