Campion Lecture at the 2021 Royal Statistical Society International Conference
Veridical Data Science: the practice of responsible data analysis and decision-making
Date: Wednesday, September 7, 2021
Time: 9:00 – 9:55 PM Pacific Time
BIDS Faculty Affiliate Bin Yu has been invited to be the Campion Lecturer at this year's Royal Statistical Society International Conference. Also known as the President's Invited lecture, this yearly lecture was named after the late Sir Harry Campion, who was the first director of the Central Statistical Office, the forerunner of the Office for National Statistics. Campion was also the inaugural director of the United Nations Statistical Office and the Royal Statistical Society’s President from 1957 to 1959.
Bin Yu, the chancellor's distinguished professor in the Departments of Statistics and of Electrical Engineering & Computer Sciences at the University of California, Berkeley, leads a group of 15-20 students and postdocs from Statistics and EECS. She was formally trained as a statistician, but her research interests and achievements extend beyond the realm of statistics. Together with her group, her work has leveraged new computational developments to solve important scientific problems by combining novel statistical machine learning approaches with the domain expertise of her many collaborators in neuroscience, genomics and precision medicine. They are also developing relevant theory to understand random forests and deep learning to provide insight and guide practice.
Lecture Abstract: "A.I. is like nuclear energy -- both promising and dangerous" -- Bill Gates, 2019.
Data Science is a pillar of A.I. and has driven most of recent cutting-edge discoveries in biomedical research. In practice, Data Science has a life cycle (DSLC) that includes problem formulation, data collection, data cleaning, modeling, result interpretation and the drawing of conclusions. Human judgement calls :wq:ware ubiquitous at every step of this process, e.g., in choosing data cleaning methods, predictive algorithms and data perturbations. Such judgment calls are often responsible for the "dangers" of A.I. To maximally mitigate these dangers, we developed a framework based on three core principles: Predictability, Computability and Stability (PCS). Through a workflow and documentation (in R Markdown or Jupyter Notebook) that allows one to manage the whole DSLC, the PCS framework unifies, streamlines and expands on the best practices of machine learning and statistics – bringing us a step forward towards veridical Data Science.
In this lecture, we will illustrate the PCS framework through the development of iterative random forests for predictive and stable non-linear interaction discovery and that of epiTree, a pipeline to discover epistasis interactions from genomics data. We will also briefly discuss two on-going PCS-driven software developments: VeridicalFlow and simChef for ease of PCS-compliant data analysis and data-driven simulations, respectively.
Bin Yu is Chancellor’s Professor in the Departments of Statistics and of Electrical Engineering & Computer Science at the University of California at Berkeley. She is founding co-director of the Microsoft Joint Lab at Peking University on Statistics and Information Technology. Her group at Berkeley is engaged in interdisciplinary research with scientists from genomics, neuroscience, and medicine. In order to solve data problems in these domain areas, her group employs quantitative critical thinking and develops statistical and machine learning algorithms and theory. She has published more than 100 scientific papers in premier journals in statistics, machine learning, information theory, signal processing, remote sensing, neuroscience, genomics, and networks. She is a member of the U.S. National Academy of Sciences and fellow of the American Academy of Arts and Sciences.