Computational Social Science Forum — Constructing interval variables via faceted Rasch measurement and multitask deep learning: a hate speech application

CSS Training Program

November 16, 2020
12:00pm to 1:30pm
Virtual Participation

Register

Computational Social Science Forum
Date: Monday, November 16, 2020
Time: 12:00-1:30 PM Pacific Time
Location: Register to receive the schedule and access links.

This week's session will feature an informal discussion with authors Chris J. Kennedy (Harvard Medical School), Geoff Bacon (UC Berkeley), Alexander Sahn (UC Berkeley), and Claudia von Vacano (UC Berkeley) about their paper:

Constructing interval variables via faceted Rasch measurement and multitask deep learning: a hate speech application
September 22, 2020  |  arXiv.org
Chris J. Kennedy, Geoff Bacon, Alexander Sahn, Claudia von Vacano

Abstract: We propose a general method for measuring complex variables on a continuous, interval spectrum by combining supervised deep learning with the Constructing Measures approach to faceted Rasch item response theory (IRT). We decompose the target construct, hate speech in our case, into multiple constituent components that are labeled as ordinal survey items. Those survey responses are transformed via IRT into a debiased, continuous outcome measure. Our method estimates the survey interpretation bias of the human labelers and eliminates that influence on the generated continuous measure. We further estimate the response quality of each labeler using faceted IRT, allowing responses from low-quality labelers to be removed.

Our faceted Rasch scaling procedure integrates naturally with a multitask deep learning architecture for automated prediction on new data. The ratings on the theorized components of the target outcome are used as supervised, ordinal variables for the neural networks' internal concept learning. We test the use of an activation function (ordinal softmax) and loss function (ordinal cross-entropy) designed to exploit the structure of ordinal outcome variables. Our multitask architecture leads to a new form of model interpretation because each continuous prediction can be directly explained by the constituent components in the penultimate layer.

We demonstrate this new method on a dataset of 50,000 social media comments sourced from YouTube, Twitter, and Reddit and labeled by 11,000 U.S.-based Amazon Mechanical Turk workers to measure a continuous spectrum from hate speech to counterspeech. We evaluate Universal Sentence Encoders, BERT, and RoBERTa as language representation models for the comment text, and compare our predictive accuracy to Google Jigsaw's Perspective API models, showing significant improvement over this standard benchmark.

The Computational Social Science Forum is an informal setting for the interdisciplinary exchange of ideas and scholarship at the intersection of social science and data science. Weekly meetings are hosted by researchers from BIDS and D-Lab, and participants engage in a variety of activities such as presentations of work in progress, discussions and critiques of recent papers, introductions to new tools and methods, discussions around ethics, fairness, inequality, and responsible conduct of research, as well as professional development. We welcome social scientists researchers with interests in data science methods and tools, and data scientists with applications or interests in public policy, social, behavioral, and health sciences. Participants include graduate students, postdocs, staff, and faculty, and members are encouraged to attend regularly in order to foster community around improving computational social science research, supporting the development and research of group members, and fostering new collaborations. This Forum is organized as part of the Computational Social Science Training Program. Meetings are currently held virtually on Mondays at 12:00-1:30 PM Pacific Time, and interested UC Berkeley community members are invited to use this registration form to receive the schedule and access links. Please contact css-t32@berkeley.edu for more information.

Speaker(s)

Chris Kennedy

BIDS Alumni - BIDS-BBDT Data Science Fellow

Chris Kennedy is now a postdoctoral fellow in biomedical informatics at Harvard Medical School, focusing on deep learning and causal inference in Gabriel Brat’s surgical informatics lab. He has a PhD in biostatistics from UC Berkeley. He is a senior fellow at UC Berkeley’s D-Lab and is affiliated with the Integrative Cancer Research Group and the Division of Research at Kaiser Permanente Northern California. At BIDS, he was a BIDS - Biomedical Big Data Training (BBDT) Data Science Fellow and a PhD student in biostatistics at UC Berkeley, where he worked with Alan Hubbard. He was also a D-Lab instructor and consultant, and an NIH biomedical big data trainee. His methodological interests encompassed targeted machine learning, randomized trials, causal inference, deep learning, text analysis, signal processing, and computer vision. His applications were primarily in precision medicine, public health, genomics, and election campaigns. His software projects included the SuperLearner ensemble learning system and varImpact for variable importance estimation; he leverages high performance computing on Savio and XSEDE clusters to accelerate his work. Prior to Berkeley he worked in political analytics in DC, running dozens of randomized trials and integrating machine learning into multi-million dollar programs to improve voter turnout for underrepresented Americans. He has also worked to support climate change action through Al Gore’s Climate Reality Project and the Yale Program on Climate Change Communication. He holds an M.A. in political science from UC Berkeley, an M.P.Aff. from the LBJ School of Public Affairs, and a B.A. in government & economics from The University of Texas at Austin.

Geoff Bacon

PhD Student, Linguistics, UC Berkeley

Geoff Bacon is a PhD student in the Linguistics department at UC Berkeley. His research is in computational semantics, building and evaluating statistical models of the acquisition of meaning. He teaches Python at the D-Lab.

Alexander Sahn

PhD Candidate, Political Science, UC Berkeley

Alexander Sahn is a PhD candidate in Political Science at UC Berkeley. His research seeks to understand and reduce inequalities in the American political economy. His work has been published in the American Political Science Review, Political Behavior, and Political Analysis. ​

Claudia von Vacano

Executive Director and Academic Coordinator, D-Lab/Digital Humanities, UC Berkeley

Dr. Claudia von Vacano is the Executive Director of the D-Lab and the Digital Humanities at Berkeley, and is on the boards of the Social Science Matrix and Berkeley Center for New Media. She has worked in policy and educational administration since 2000, and at the UC Office of the President and UC Berkeley since 2008.