In this talk, I discuss the role of qualitative and ethnographic methods in relation to computer, information, and data science. These holistic, reflexive, and meta-level approaches to studying data and computation in context help us better understand how to both support and practice data analytics at various scales.
Abstract: The statistical techniques and computational infrastructures of artificial intelligence and data science are increasingly built into products, platforms, organizations, and institutions of all kinds. Yet the collection, curation, and analysis of data has always been as social as it is technical. Even in the most automated, “data-driven” systems, there is always human labor in designing, developing, deploying, documenting, debating, maintaining, managing, manipulating, training, triaging, translating, using, and not using such systems. In focusing on the human contexts of computation and data across the pipeline, we gain key insights into various issues across fields, as well as new possibilities for collaboratively producing knowledge. I will discuss several cases from my ethnographic research empirically studying institutions and infrastructures that support the production and distribution of knowledge. These include: how Wikipedians automate quality control while seeking to keep humans in the loop and uphold their principles of openness and decentralization; how targets of coordinated harassment campaigns on Twitter developed tools to help moderate their own experiences; the academic career paths of those who practice and support data science; the sustainability of open source communities that develop and maintain key software tools; and the interpretation of findings made from large-scale analyses of social data.
R. Stuart Geiger
Former BIDS Ethnographer Stuart Geiger is now a faculty member at the University of California, San Diego, jointly appointed in the Department of Communication and the Halıcıoğlu Data Science Institute. At BIDS, as an ethnographer of science and technology, he studied the infrastructures and institutions that support the production of knowledge. He launched the Best Practices in Data Science discussion group in 2019, having been one of the original members of the MSDSE Data Science Studies Working Group. Previously, his work on Wikipedia focused on the community of volunteer editors who produce and maintain an open encyclopedia. He also studied distributed scientific research networks and projects, including the Long-Term Ecological Research Network and the Open Science Grid. In Wikipedia and scientific research, he studied topics including newcomer socialization, community governance, specialization and professionalization, quality control and verification, cooperation and conflict, the roles of support staff and technicians, and diversity and inclusion. And, as these communities are made possible through software systems, he studied how the design of software tools and systems intersect with all of these issues. He received an undergraduate degree at UT Austin, and an MA in Communication, Culture, and Technology at Georgetown University, where he began empirically studying communities using qualitative and ethnographic methods. As part of receiving his PhD from the UC Berkeley School of Information, he worked with anthropologists, sociologists, psychologists, historians, organizational and management scholars, designers, and computer scientists.