Cooking Data with Care: Qualitative and Quantitative Studies of Wikipedia

ACM International Symposium on Open Collaboration (OpenSym)


August 23, 2018
Paris, France

BIDS Ethnographer Stuart Geiger and his colleague Aaron Halfaker presented this keynote address at the ACM International Symposium on Open Collaboration (OpenSym), in Paris, France, on August 23, 2018.

Abstract: In contemporary discussions about data science, two often parallel conversations about how we ought to do data science have been taking place. Conversations about about “reproducibility” tend to focus on issues of scientific rigor and validity, while those about “data ethics” instead often focus on issues of social impact and human rights. Both are ways in which we talk about what it means to do data science well, but to what extent are these the same kinds of issues or more orthogonal to each other? As an example to think through, I discuss a recent publication in which we reproduced and critiqued an existing social science publication. We did not contest their statistical analyses, but instead took issue with how they computationally defined a culturally-specific concept and how they subsequently interpreted their findings. As data science is increasingly used to study social and political topics, to what extent is the computational operationalization of complex social and political concepts an ethical issue in both the frames of scientific validity and social impact?

Download the slides from this presentation here.


R. Stuart Geiger

BIDS Alum – Ethnographer

Former BIDS Ethnographer Stuart Geiger is now a faculty member at the University of California, San Diego, jointly appointed in the Department of Communication and the Halıcıoğlu Data Science Institute. At BIDS, as an ethnographer of science and technology, he studied the infrastructures and institutions that support the production of knowledge. He launched the Best Practices in Data Science discussion group in 2019, having been one of the original members of the MSDSE Data Science Studies Working Group. Previously, his work on Wikipedia focused on the community of volunteer editors who produce and maintain an open encyclopedia. He also studied distributed scientific research networks and projects, including the Long-Term Ecological Research Network and the Open Science Grid. In Wikipedia and scientific research, he studied topics including newcomer socialization, community governance, specialization and professionalization, quality control and verification, cooperation and conflict, the roles of support staff and technicians, and diversity and inclusion. And, as these communities are made possible through software systems, he studied how the design of software tools and systems intersect with all of these issues.  He received an undergraduate degree at UT Austin, and an MA in Communication, Culture, and Technology at Georgetown University, where he began empirically studying communities using qualitative and ethnographic methods.  As part of receiving his PhD from the UC Berkeley School of Information, he worked with anthropologists, sociologists, psychologists, historians, organizational and management scholars, designers, and computer scientists. 

Aaron Halfaker

Computer Scientist
Wikipedia Foundation

Aaron Halfaker is a computer scientist and a principal research scientist at the Wikimedia Foundation. He earned a Ph.D. in computer science from the GroupLens research lab at the University of Minnesota in 2013. He is known for his research on Wikipedia and the decrease in the number of active editors of the site. He has also built an artificial intelligence engine known as “Objective Revision Evaluation Service” (or ORES for short), used to identify vandalism on Wikipedia and distinguish it from good faith edits.”