Causal Inference in the Age of Big Data

Berkeley Distinguished Lectures in Data Science

The rise of massive datasets that provide fine-grained information about human beings and their behavior offers unprecedented opportunities for evaluating the effectiveness of social, behavioral, and medical treatments. With the availability of fine-grained data, researchers and policymakers are increasingly unsatisfied with estimates of average treatment effects based on experimental samples that are unrepresentative of populations of interest. Instead, they seek to target treatments to particular populations and subgroups. Because of these inferential challenges, Machine Learning (ML) is now being used for evaluating and predicting the effectiveness of interventions in a wide range of domains from technology firms to clinical medicine and election campaigns. However, there are a number of issues that arise with the use of ML for causal inference. For example, although ML and related statistical models are good for prediction, they are not designed to estimate causal effects. Instead, they focus on predicting observed outcomes. Treatment effects, however, are never directly observed, and creating validation datasets where ground truth is known is difficult. Such validation is of particular importance because although ML algorithms have been designed to overcome prediction challenges when the data generating process is unknown, they cannot overcome bias when treatment assignment is a function of variables that are not observed. In this talk, Dr. Sekhon will discuss some recent methodological developments and examples of using ML methods to draw causal inferences.

The Berkeley Distinguished Lectures in Data Science, co-hosted by the Berkeley Institute for Data Science (BIDS) and the Berkeley Division of Data Sciences, features faculty doing visionary research that illustrates the character of the ongoing data, computational, inferential revolution.  In this inaugural Fall 2017 "local edition," we bring forward Berkeley faculty working in these areas as part of enriching the active connections among colleagues campus-wide.  All campus community members are welcome and encouraged to attend.  Arrive at 3:30pm for tea, coffee, and discussion.

Speaker(s)

Jasjeet S. Sekhon

Professor, Political Science and Statistics

Jasjeet S. Sekhon is Professor of Political Science and Statistics at University of California, Berkeley. His current research focuses on methods for causal inference in observational and experimental studies and evaluating social science, public health and medical interventions. Professor Sekhon has done research on elections, voting behavior and public opinion in the United States, multivariate matching methods for causal inference, machine learning algorithms for irregular optimization problems, robust estimators with bounded influence functions, health economic cost effectiveness analysis, and the philosophy and history of inference and statistics in the social sciences.