Three principles for data science: predictability, stability, and computability

Berkeley Distinguished Lectures in Data Science

Lecture

September 12, 2017
4:10pm to 5:00pm
190 Doe Library
Get Directions

In this talk, I'd like to discuss the intertwining importance and connections of three principles of data science in the title in data-driven decisions. Making prediction as its central task and embracing computation as its core, machine learning has enabled wide-ranging data-driven successes. Prediction is a useful way to check with reality. Good prediction implicitly assumes stability between past and future. Stability (relative to data and model perturbations) is also a minimum requirement for interpretability and reproducibility of data driven results (cf. Yu, 2013). It is closely related to uncertainty assessment. Obviously, both prediction and stability principles can not be employed without feasible computational algorithms, hence the importance of computability. 

The three principles will be demonstrated in the context of two neuroscience collaborative projects with the Gallant Lab and through analytical connections. In particular, the first project adds stability to predictive modeling used for reconstruction of movies from fMRI brain signlas to gain interpretability of the predictive model.  The second project uses predictive transfer learning that combines AlexNet, GoogleNet and VGG with single V4 neuron data for state-of-the-art prediction performance. Moreover, it provides stable function characterization of neurons via (manifold) deep dream images from the predictive models in the difficult primate visual cortex V4.  Our V4 results lend support, to a certain extent, to the resemblance of these CNNs to a primate brain.

The Berkeley Distinguished Lectures in Data Science, co-hosted by the Berkeley Institute for Data Science (BIDS) and the Berkeley Division of Data Sciences, features faculty doing visionary research that illustrates the character of the ongoing data, computational, inferential revolution.  In this inaugural Fall 2017 "local edition," we bring forward Berkeley faculty working in these areas as part of enriching the active connections among colleagues campus-wide.  All campus community members are welcome and encouraged to attend.  Arrive at 3:30pm for tea, coffee, and discussion.

Speaker(s)

Bin Yu

Professor and Second Chair, Departments of Statistics and EECS, UC Berkeley

Bin Yu is Chancellor’s Distinguished Professor and Class of 1936 Second Chair in the Departments of Statistics and of Electrical Engineering & Computer Science, and Center for Computational Biology at the University of California, Berkeley. She is an Investigator with the Weill Neurohub, a collaboration of the University of California, Berkeley (UC Berkeley), the University of California, San Francisco (UCSF), and the University of Washington (the UW). She leads the Yu Group at Berkeley, which is engaged in interdisciplinary research with scientists from genomics, neuroscience, and medicine. In order to solve data problems in these domain areas, her group employs quantitative critical thinking and develops statistical and machine learning algorithms and theory. She has published more than 100 scientific papers in premier journals in statistics, machine learning, information theory, signal processing, remote sensing, neuroscience, genomics, and networks. She was a Guggenheim Fellow and President of Institute of Mathematical Statistics (IMS), and is a member of the U.S. National Academy of Sciences and fellow of the American Academy of Arts and Sciences.