Ethics and Empathy in Using Imputation to Disaggregate Data for Racial Equity

Berkeley Computational Social Science Forum

CSS Training Program

September 28, 2021
4:00pm to 5:00pm
Virtual Participation


Computational Social Science Forum
Date: Tuesday, September 28, 2021
Time: 4:00-5:00 PM Pacific Time
Location: Virtual Participation – Register to attend via Zoom

Ethics and Empathy in Using Imputation to Disaggregate Data for Racial Equity

Speakers: Alena Stern, Senior Data Scientist, and Ajjit Narayanan, Data Scientist, Urban Institute

Abstract: Disaggregating data by race and ethnicity is a critical method for shining light on racialized systems of privilege and oppression. Imputation is a powerful tool for disaggregating data by generating racial and ethnic identifiers onto datasets lacking this information. But if used without a proactive focus on equity, it can harm Black people, Indigenous people, and other people of color. In this talk, we will share lessons we learned from a case study in which we proactively incorporated equity in imputing race and ethnicity onto a nationally representative sample of credit bureau data. We organize these lessons around a set of “ethics checkpoints” that researchers, analysts, and practitioners can use to identify and address potential racial bias and inaccuracy: checkpoint 1: before imputation, audit input data for bias; checkpoint 2: during imputation, examine where bias could be introduced at each step; and checkpoint 3: after imputation, assess whether imputed race/ethnicity data are accurate enough to used ethically for your analytic purpose.

The Computational Social Science Forum is an informal setting for the interdisciplinary exchange of ideas and scholarship at the intersection of social science and data science. Participants engage in a variety of activities such as presentations of work in progress, discussions and critiques of recent papers, introductions to new tools and methods, discussions around ethics, fairness, inequality, and responsible conduct of research, as well as professional development. This Forum is organized as part of the Computational Social Science Training Program, and weekly meetings are hosted by researchers from BIDS and D-Lab. The group welcomes social scientists and researchers with interests in data science methods and tools, and data scientists with applications or interests in public policy, social, behavioral, and health sciences. Participants include graduate students, postdocs, staff, and faculty, and members are encouraged to attend regularly in order to foster community around improving computational social science research, supporting the development and research of group members, and fostering new collaborations. Interested UC Berkeley community members are invited to use this registration form to receive the schedule and access links. Please contact for more information or if you are interested in presenting current research for an upcoming session.


Alena Stern

Senior Data Scientist, Urban Institute

Alena Stern is a senior data scientist at the Urban Institute studying policy solutions to advance equity and inclusion in cities. Before joining Urban, she worked as a senior program manager with AidData, an Open Cities fellow at the Sunlight Foundation, and a graduate research assistant at the Center for Data Science and Public Policy, where she used machine learning, natural language processing, statistical analysis, and geospatial data to inform the design of government policies and international development programs. Stern holds a BA in economics and international relations from the College of William and Mary and an MS in computational analysis and public policy from the University of Chicago.

Ajjit Narayanan

Data Scientist, Urban Institute

Ajjit Narayanan is a data scientist at the Urban Institute, where he works on research to advance equity and inclusion in rural and urban communities. He enjoys building open source data and mapping tools for researchers and advises on tools such as interactive mapping, big data platforms, and data visualization methods to relevant public policy issues. Narayanan also coleads Urban’s R Users Group and Mapping Users Group to encourage the use of R and geospatial analysis for statistical analysis, data visualization, and automation. Narayanan holds a BA in economics and a BA in urban studies, along with minors in math and statistics, from the University of Pennsylvania.