Three principles for data science: predictability, stability, and computability

Berkeley Distinguished Lectures in Data Science

In this talk, I'd like to discuss the intertwining importance and connections of three principles of data science in the title in data-driven decisions. Making prediction as its central task and embracing computation as its core, machine learning has enabled wide-ranging data-driven successes. Prediction is a useful way to check with reality. Good prediction implicitly assumes stability between past and future. Stability (relative to data and model perturbations) is also a minimum requirement for interpretability and reproducibility of data driven results (cf. Yu, 2013). It is closely related to uncertainty assessment. Obviously, both prediction and stability principles can not be employed without feasible computational algorithms, hence the importance of computability. 

The three principles will be demonstrated in the context of two neuroscience collaborative projects with the Gallant Lab and through analytical connections. In particular, the first project adds stability to predictive modeling used for reconstruction of movies from fMRI brain signlas to gain interpretability of the predictive model.  The second project uses predictive transfer learning that combines AlexNet, GoogleNet and VGG with single V4 neuron data for state-of-the-art prediction performance. Moreover, it provides stable function characterization of neurons via (manifold) deep dream images from the predictive models in the difficult primate visual cortex V4.  Our V4 results lend support, to a certain extent, to the resemblance of these CNNs to a primate brain.

The Berkeley Distinguished Lectures in Data Science, co-hosted by the Berkeley Institute for Data Science (BIDS) and the Berkeley Division of Data Sciences, features faculty doing visionary research that illustrates the character of the ongoing data, computational, inferential revolution.  In this inaugural Fall 2017 "local edition," we bring forward Berkeley faculty working in these areas as part of enriching the active connections among colleagues campus-wide.  All campus community members are welcome and encouraged to attend.  Arrive at 3:30pm for tea, coffee, and discussion.

Speaker(s)

Bin Yu

Chancellor’s Professor, Statistics Department

Bin Yu is Chancellor’s Professor in the Departments of Statistics and of Electrical Engineering & Computer Science at the University of California at Berkeley and a former Chair of Statistics at Berkeley. She is founding co-director of the Microsoft Joint Lab at Peking University on Statistics and Information Technology. Her group at Berkeley is engaged in interdisciplinary research with scientists from genomics, neuroscience, and medicine. In order to solve data problems in these domain areas, her group employs quantitative critical thinking and develops statistical and machine learning algorithms and theory. She has published more than 100 scientific papers in premier journals in statistics, machine learning, information theory, signal processing, remote sensing, neuroscience, genomics, and networks.

She is a member of the U.S. National Academy of Sciences and fellow of the American Academy of Arts and Sciences. She was a Guggenheim Fellow in 2006, an invited speaker at ICIAM in 2011, the Tukey Memorial Lecturer of the Bernoulli Society in 2012, and an invited speaker at the Rietz Lecture of Institute of Mathematical Statistics (IMS) in 2016. She was IMS president in 2013–2014, and she is a fellow of IMS, ASA, AAAS, and IEEE. She has served or is serving on leadership committees of NAS-BMSA, SAMSI, IPAM, and ICERM and on editorial boards for the Journal of Machine Learning, Annals of Statistics, and Annual Review of Statistics.