In this talk, I discuss the role of qualitative and ethnographic methods in relation to computer, information, and data science. These holistic, reflexive, and meta-level approaches to studying data and computation in context help us better understand how to both support and practice data analytics at various scales.
Abstract: The collection, curation, and analysis of data has always been as much of a social challenge as it is a technical one. As the statistical techniques and computational infrastructures of artificial intelligence and data science rapidly develop, we must not forget that it is people who ultimately make research possible – even in the most fully-automated, ‘data-driven’ computational pipeline. But how do we bring human-centered perspectives and cultural contexts to data-intensive, highly-automated algorithmic systems of knowledge production? In this talk, I discuss the role of qualitative and ethnographic methods in relation to computer, information, and data science. These holistic, reflexive, and meta-level approaches to studying data and computation in context help us better understand how to both support and practice data analytics at various scales. Focusing on the human contexts of data across the pipeline brings key insights and new collaborations to various issues, such as: the career paths of those who practice and support data science; the sustainability of open source communities that develop and maintain key software tools; the governance of complex, data-driven decision-making systems; and the interpretation of findings made from large-scale analyses of social data.
Former BIDS Ethnographer Stuart Geiger is now a faculty member at the University of California, San Diego, jointly appointed in the Department of Communication and the Halıcıoğlu Data Science Institute. At BIDS, as an ethnographer of science and technology, he studied the infrastructures and institutions that support the production of knowledge. He launched the Best Practices in Data Science discussion group in 2019, having been one of the original members of the MSDSE Data Science Studies Working Group. Previously, his work on Wikipedia focused on the community of volunteer editors who produce and maintain an open encyclopedia. He also studied distributed scientific research networks and projects, including the Long-Term Ecological Research Network and the Open Science Grid. In Wikipedia and scientific research, he studied topics including newcomer socialization, community governance, specialization and professionalization, quality control and verification, cooperation and conflict, the roles of support staff and technicians, and diversity and inclusion. And, as these communities are made possible through software systems, he studied how the design of software tools and systems intersect with all of these issues. He received an undergraduate degree at UT Austin, and an MA in Communication, Culture, and Technology at Georgetown University, where he began empirically studying communities using qualitative and ethnographic methods. As part of receiving his PhD from the UC Berkeley School of Information, he worked with anthropologists, sociologists, psychologists, historians, organizational and management scholars, designers, and computer scientists.