Cooking Data with Care: The Role of Contextual Inquiry in Large-Scale Quantitative Research

eScience Institute, Data Science Studies Meeting


January 23, 2019
12:30pm to 1:30pm
Seattle, WA


How do we bring human-centered perspectives and cultural contexts to data-intensive, highly-automated algorithmic ways of knowing? Qualitative and quantitative methods are often imagined as orthogonal ways of answering questions in fundamentally different, incommensurable ways. In this talk, I argue that there is often substantial qualitative contextual inquiry and expertise deployed in quantitative methods. Such insights are crucial to “cooking data with care,” as Geoff Bowker advocated. These situated knowledges are typically implicitly leveraged as analysts make key decisions about what data to use, when data needs to be calibrated, how to transform and merge data for a given purpose, how to reduce dimensionality to work with complex datasets, which variables to use as proxies, and how to interpret the validity of results. To illustrate, I share experiences with a recent publication in which we directly integrated ethnographic and qualitative methods to reproduce, extend, and ultimately contest the interpretations of a previous large-scale computational study that claimed to discover substantial levels of conflict between automated bots in Wikipedia. Our mixed-methods approach is an integrative, iterative synthesis, not a coordinated pluralism: there is a column in a dataframe that would not exist without ethnography, as well as thick descriptions of cases that were found by sampling from the extreme ends of a statistical distribution. Through my experiences in this project, I discuss several lessons learned and future directions, which touch on topics including open science, reproducibility, and data ethics.


R. Stuart Geiger

BIDS Alum – Ethnographer

Former BIDS Ethnographer Stuart Geiger is now a faculty member at the University of California, San Diego, jointly appointed in the Department of Communication and the Halıcıoğlu Data Science Institute. At BIDS, as an ethnographer of science and technology, he studied the infrastructures and institutions that support the production of knowledge. He launched the Best Practices in Data Science discussion group in 2019, having been one of the original members of the MSDSE Data Science Studies Working Group. Previously, his work on Wikipedia focused on the community of volunteer editors who produce and maintain an open encyclopedia. He also studied distributed scientific research networks and projects, including the Long-Term Ecological Research Network and the Open Science Grid. In Wikipedia and scientific research, he studied topics including newcomer socialization, community governance, specialization and professionalization, quality control and verification, cooperation and conflict, the roles of support staff and technicians, and diversity and inclusion. And, as these communities are made possible through software systems, he studied how the design of software tools and systems intersect with all of these issues.  He received an undergraduate degree at UT Austin, and an MA in Communication, Culture, and Technology at Georgetown University, where he began empirically studying communities using qualitative and ethnographic methods.  As part of receiving his PhD from the UC Berkeley School of Information, he worked with anthropologists, sociologists, psychologists, historians, organizational and management scholars, designers, and computer scientists.