Veridical Data Science: the practice of responsible data analysis and decision-making

Campion Lecturer, 2021 Royal Statistical Society Conference

Lecture

September 7, 2021
9:00am to 9:55am
Virtual Participation

Campion Lecture at the 2021 Royal Statistical Society International Conference 
Veridical Data Science: the practice of responsible data analysis and decision-making 
Date: Wednesday, September 7, 2021 
Time: 9:00 – 9:55 PM Pacific Time

RSS logo bannerBIDS Faculty Affiliate Bin Yu has been invited to be the Campion Lecturer at this year's Royal Statistical Society International Conference. Also known as the President's Invited lecture, this yearly lecture was named after the late Sir Harry Campion, who was the first director of the Central Statistical Office, the forerunner of the Office for National Statistics. Campion was also the inaugural director of the United Nations Statistical Office and the Royal Statistical Society’s President from 1957 to 1959.

Bin Yu, the chancellor's distinguished professor in the Departments of Statistics and of Electrical Engineering & Computer Sciences at the University of California, Berkeley, leads a group of 15-20 students and postdocs from Statistics and EECS. She was formally trained as a statistician, but her research interests and achievements extend beyond the realm of statistics. Together with her group, her work has leveraged new computational developments to solve important scientific problems by combining novel statistical machine learning approaches with the domain expertise of her many collaborators in neuroscience, genomics and precision medicine. They are also developing relevant theory to understand random forests and deep learning to provide insight and guide practice.

Lecture Abstract:  "A.I. is like nuclear energy -- both promising and dangerous" -- Bill Gates, 2019. 

Data Science is a pillar of A.I. and has driven most of recent cutting-edge discoveries in biomedical research. In practice, Data Science has a life cycle (DSLC) that includes problem formulation, data collection, data cleaning, modeling, result interpretation and the drawing of conclusions. Human judgement calls :wq:ware ubiquitous at every step of this process, e.g., in choosing data cleaning methods, predictive algorithms and data perturbations. Such judgment calls are often responsible for the "dangers" of A.I. To maximally mitigate these dangers, we developed a framework based on three core principles: Predictability, Computability and Stability (PCS). Through a workflow and documentation (in R Markdown or Jupyter Notebook) that allows one to manage the whole DSLC, the PCS framework unifies, streamlines and expands on the best practices of machine learning and statistics – bringing us a step forward towards veridical Data Science. 

In this lecture, we will illustrate the PCS framework through the development of iterative random forests for predictive and stable non-linear interaction discovery and that of epiTree, a pipeline to discover epistasis interactions from genomics data. We will also briefly discuss two on-going PCS-driven software developments: VeridicalFlow and simChef for ease of PCS-compliant data analysis and data-driven simulations, respectively.

Speaker(s)

Bin Yu

Professor and Second Chair, Departments of Statistics and EECS, UC Berkeley

Bin Yu is Chancellor’s Distinguished Professor and Class of 1936 Second Chair in the Departments of Statistics and of Electrical Engineering & Computer Science, and Center for Computational Biology at the University of California, Berkeley. She is an Investigator with the Weill Neurohub, a collaboration of the University of California, Berkeley (UC Berkeley), the University of California, San Francisco (UCSF), and the University of Washington (the UW). She leads the Yu Group at Berkeley, which is engaged in interdisciplinary research with scientists from genomics, neuroscience, and medicine. In order to solve data problems in these domain areas, her group employs quantitative critical thinking and develops statistical and machine learning algorithms and theory. She has published more than 100 scientific papers in premier journals in statistics, machine learning, information theory, signal processing, remote sensing, neuroscience, genomics, and networks. She was a Guggenheim Fellow and President of Institute of Mathematical Statistics (IMS), and is a member of the U.S. National Academy of Sciences and fellow of the American Academy of Arts and Sciences.