D-Lab Executive Director Claudia von Vacano and BIDS Data Science Fellow Chris Kennedy co-presented this talk about an ongoing project that is now being supported by BIDS.
The Online Hate Index (OHI) is a research partnership between UC Berkeley’s D-Lab and Google Jigsaw that seeks to improve society's understanding of online hate speech (from sources such as YouTube, Reddit, Twitter and other social media sites), including its prevalence over time, variation across regions and demographics, our ability to measure it through crowdsourcing and algorithms, and how to influence it through historical or future interventions. Through a combination of citizen science and machine learning, the team is developing a nuanced measurement methodology that decomposes hate speech into various constituent components, enabling it to be transformed into a continuous “hate speech scale,” making it easier to rate, evaluate and understand than a single omnibus question (i.e. "is this comment hate speech?").
The project is setting new standards for the data science of hate speech, with goals to 1) establish a theoretically-grounded definition of hate speech inclusive of research/policies/practice, 2) develop and apply a multi-component labeling instrument, 3) create a new crowdsourcing tool to scalably label comments, 4) curate an open, reliable multi-platform labeled hate speech corpus, 5) grow existing data and tool repositories within principles of replicable and reproducible research, enabling greater transparency and collaboration, 6) create new knowledge through ethical online experimentation (and citizen science), and 7) refine AI models. The research team includes Geoff Bacon (Linguistics Ph.D. candidate); Nora Broege (Postdoc at Rutgers University); Chris Kennedy (Biostatistics Ph.D. student, BIDS Fellow); and Alexander Sahn (Political Science Ph.D. candidate).
Ultimately, we seek to understand the causal mechanisms for intervention and evaluation, while defending free speech. A new open-source platform - to be used by the Anti-Defamation League and other advocacy organizations - will make these resources (along with policy recommendations) available to educate the public and grow the larger data science / citizen science community.
BIDS Data Science Lectures are open to the entire campus community.
This is a BIDS-supported research project.
Chris Kennedy is now a postdoctoral fellow in biomedical informatics at Harvard Medical School, focusing on deep learning and causal inference in Gabriel Brat’s surgical informatics lab. He has a PhD in biostatistics from UC Berkeley. He is a senior fellow at UC Berkeley’s D-Lab and is affiliated with the Integrative Cancer Research Group and the Division of Research at Kaiser Permanente Northern California. At BIDS, he was a BIDS - Biomedical Big Data Training (BBDT) Data Science Fellow and a PhD student in biostatistics at UC Berkeley, where he worked with Alan Hubbard. He was also a D-Lab instructor and consultant, and an NIH biomedical big data trainee. His methodological interests encompassed targeted machine learning, randomized trials, causal inference, deep learning, text analysis, signal processing, and computer vision. His applications were primarily in precision medicine, public health, genomics, and election campaigns. His software projects included the SuperLearner ensemble learning system and varImpact for variable importance estimation; he leverages high performance computing on Savio and XSEDE clusters to accelerate his work. Prior to Berkeley he worked in political analytics in DC, running dozens of randomized trials and integrating machine learning into multi-million dollar programs to improve voter turnout for underrepresented Americans. He has also worked to support climate change action through Al Gore’s Climate Reality Project and the Yale Program on Climate Change Communication. He holds an M.A. in political science from UC Berkeley, an M.P.Aff. from the LBJ School of Public Affairs, and a B.A. in government & economics from The University of Texas at Austin.
Claudia von Vacano
Claudia von Vacano is the Executive Director of D-Lab and Digital Humanities at UC Berkeley. She conceptualized and is the principal investigator of the hate speech research project and the introduction to data science curriculum for SAGE publications. She works as an advisor for the Data Science Education Program on Data Scholars, and she is on the boards of the Berkeley Center for New Media, the Social Science Matrix, and the Academic Innovations Studio.