2017 TextXD Conference

Text Analysis Across Domains


December 1, 2017
9:00am to 5:00pm
190 Doe Library
Get Directions

preXD Text Analysis Workshop
Date: November 29, 2017, 1:00 to 6:00 PM
Location: Academic Innovation Studio (Dwinelle 117, Level D), UC Berkeley 

This year, the TextXD Conference started with a preXD Workshop on November 29 from 1:00-6:00 PM, in order to give a quick introduction to text analysis in Python using Jupyter Notebooks. This session was specifically designed to bring people to the text-analysis-starting-line so that everyone would be ready for the make sessions over the next two days. No prior text analysis experience was needed to attend the preXD. Those without Python familiarity were invited to check out some introductory materials from UC Berkeley’s D-Lab here: https://github.com/dlab-berkeley/python-fundamentals - click the “launch binder” black and red badge to run it all in your browser.

TextXD - Text Analysis Across Domains Fall 2017 Conference
Dates: November 30 and December 1, 2017, 10:00 AM to 4:30 PM 
Location: 190 Doe Library, UC Berkeley 

This semester's TextXD event was the biggest TextXD event to date. With so much going on in the world of natural language processing, TextXD opened to researchers beyond the UC Berkeley campus. The agenda consisted of short morning talks on new tools, methods, software, and data (see videos in the Agenda, listed below). Speakers came from our own campus as well as UC San Francisco, UC San Diego, UC Santa Barbara, Princeton, and Drexel. Afternoon "make" sessions were also introduced this year so that participants could roll up their sleeves and spend time working together to craft solutions to our shared problems or to investigate research questions of shared interest. The text data used in the "make" sessions included newspaper articles, twitter feeds, emails, congressional hearings, and journal article abstracts. 


THURSDAY, NOVEMBER 30 at BIDS (190 Doe Library)

FRIDAY, DECEMBER 1 at BIDS (190 Doe Library)



Below is a presentation of the complete set of videos from the 2017 TextXD Conference, held at the Berkeley Institute for Data Science (BIDS) on November 29 through December 1, 2017.

THURSDAY, NOVEMBER 30 at BIDS (190 Doe Library)

10:10-10:35 AM
John Mohr, University of California, Santa Barbara
The Frontiers of Social Scientific Text Analysis


10:35-10:45 AM
Cody Hennesy, University of California, Berkeley
Text Analysis on 14 Million Digital Library Books


10:45-11:15 AM
Julia Silge, StackOverflow
Text Mining with Tidy Data Principles and Count-based Methods


11:15-11:30 AM
Pramit Choudhary, DataScience
Explainable NLP Algorithms: Understanding Word Relevance in Text Datasets


11:30-11:40 AM
Elena Glassman, BIDS, University of California, Berkeley
Wavelets for Text


11:40 AM-12:00 PM
Jamie Murdoch, University of California, Berkeley
Beyond Word Importance: Contextual Decomposition for Interpreting LSTMs


12:00-12:05 PM
Devin Cornell, University of California, Santa Barbara
Word Embedding and Semantic Analysis of News Data


12:05-12:25 PM
Host: Marla Stuart, University of California, Berkeley
Introduction and Make Session Previews


1:00-1:30 PM
Lunch Chat Panel — The Frontier of NLP (at Berkeley and Beyond)


1:30-5:00 PM— Make Session

FRIDAY, DECEMBER 1 at BIDS (190 Doe Library)

10:05-10:20 AM
Claudia von Vacano, D-Lab, University of California, Berkeley
Scalable Detection of Online Hate Speech


10:20-10:50 AM
Jake Ryland Williams, Drexel University
Minimal Semantic Units in Text Analysis


10:50-11:05 AM
Han Zhang, Princeton University
Uncovering Authoritarian Rule: Identifying Collective Action with Social Media Data


11:05-11:35 AM
Rex Douglass (UCSD)
Georeferencing of Events from Text


11:35-11:55 AM 
Nick Adams, BIDS, University of California, Berkeley
TextThresher: Qualitative Text Analysis at a Quantitative Scale 


11:55 AM-12:05 PM
Oksana Gologorskaya, University of California, San Francisco 
Text Analysis in Biomedical Applications at UCSF


12:05-12:20 PM
Miriam Petruck, International Computer Science Institute
The FrameNet Database -- FrameNet: The Tip of the Iceberg


12:20-12:35 PM
Meredith Lee, West Big Data Innovation Hub and University of California, Berkeley
Collaborating with the Big Data Innovation Hubs


1:00-1:30 PM 
Lunch Chat Panel — Humans In the Loop: The Role of Humans in Text Analysis


4:30-5:00 PM
Conference Closing and Remarks from Participants


Organizing Committee: Nick Adams, Marla Stuart, Alex Paxton, Chris Hench, Elena Glassman



Follow us on Twitter: #TextXD17 


Nick Adams

BIDS Alum – Research Fellow, Social Science

Former BIDS Research Fellow Nick Adams, PhD, is now Founder & Chief Scientist of Goodly Labs, an organization that provides collaborative online resources and opportunities that enable citizen scientists to engage with publicly available data. He is a sociologist, data scientist, and creator building tools and experiences that help people find common ground and build a better society. In a career motivated by his aspiration to improve the world, Adams has led electoral campaigns, directed the national security division of a think tank, completed ground-breaking research on police/protester interactions, constructed and shared massive and intricate datasets, invented new natural language processing methodologies and collaborative software, and instructed hundreds of students on topics including classical and contemporary social theory, social science methods, social psychology, political sociology, deviance and social control, and text analysis. Adams has founded and led multiple successful and surviving organizations, including Thusly Inc., UC Berkeley's Text Across Domains, the Computational Text Analysis Working Group, and his non-profit Goodly Labs, the sociotechnical skunkworks behind Public Editor, Demo Watch, Research Ready, and Same Page. His work has appeared in academic journals as well as The New York Times, Roll Call, The Atlantic, and Reader's Digest.  He has been funded by the Alfred P. Sloan Foundation, the MCcune Foundation, Schmidt Futures, the Berkeley Institute for Data Science, the Pritzker Family Fund, SAGE Publishing, the Social Science Research Council, and the National Science Foundation.


Marla Stuart

BIDS Alum - Data Science Fellow

At UC Berkeley, Marla Stuart was a BIDS Data Science Fellow working with the Guizhou Berkeley Big Data Innovation Research Center (GBIC), a research hub based in Guizhou Province, China, dedicated to improving the health and well-being of China’s population. Her work with the GBIC focused on developing actionable programmatic and policy recommendations for consideration by government agencies. She led the GBIC computational lab, which collected, wrangled and modeled data from government bureaus and other sources to support the research goals of agency partners and GBIC faculty. Her own research concentrated on understanding the applicability of data science approaches in social welfare research and practice settings.
Previously, Marla had spent twenty years conducting practice-based research in public and private organizations that provide health and human services in vulnerable communities. This included fifteen years with the Navajo Nation in Arizona, where she worked with local communities to develop health and social services evaluation approaches derived from traditional Navajo philosophy and values.
Marla earned her Masters of Social Work from the University of Washington in Seattle with a focus on planned social change. She received her PhD from the School of Social Welfare at Berkeley. Her dissertation explored government efforts to scale the use of evidence-based services. It used public government records and crowd-sourced and computational data-extraction methods to create measures of these strategies. It assessed the relative effects of these public strategies on scaling progress using time-to-event analysis. It found that county governments are well positioned to implement scaling strategies and that the proportion of social service providers adopting evidence-informed services can be increased as can the proportion of county funding directed to these organizations. This study design is highly replicable and as such provides a general model to apply to other local environments to identify common county levers that effectively promote the scaling of evidence-informed social services.

Alexandra Paxton

BIDS Alum - Data Science Fellow

Alexandra is a BIDS data science fellow and a postdoctoral scholar working with Tom Griffiths in the Institute of Cognitive and Brain Sciences. She got her PhD in cognitive and information sciences from the University of California, Merced, in December 2015.

Her work explores human communication in data-rich environments. From capitalizing on large-scale real-world corpora to capturing multimodal experimental data, her research seeks to understand how context changes communication dynamics. Broadly, her work integrates computational and social perspectives to understand interpersonal interaction as a nonlinear dynamical system.

Relatedly, Alexandra also develops research methods to facilitate quantitative research on interaction and encourages others to use data-rich computational methods through teaching and service. As part of that effort, she works with the Center for Data on the Mind to foster the application of big data to questions about cognition and behavior.

Christopher Hench

BIDS Alum – Data Science Fellow

Christopher Hench was a BIDS Data Science Fellow and a PhD Candidate in German Literature and Medieval Studies at UC Berkeley from 2017 to 2018. He studied computational approaches to the formal analysis of lyric and epic poetry, and reading soundscapes. More broadly, with a particular interest in the challenges of domain adaptation for NLP and algorithms for the detection and scoring of text reuse. Christopher was also the Program Development Lead for Digital Humanities at Berkeley and the D-Lab at Berkeley, where he collaborated in several research projects and taught Python and Git workshops. He also coordinated the modules development effort in cooperation with BIDS, D-Lab, and the Data Science Education Program DSEP.

Elena Glassman

BIDS Alum – Former BIDS Data Science Fellow (EECS)

Former BIDS Data Science Fellow Elena Glassman is an Assistant Professor of Computer Science at the Harvard Paulson School of Engineering & Applied Sciences (SEAS), and the Stanley A. Marks & William H. Marks Professor at the Radcliffe Institute for Advanced Study. At Berkeley, Glassman was an EECS postdoctoral researcher at the Berkeley Institute of Design, advised by Bjoern Hartmann. She earned her EECS PhD at MIT CSAIL in August 2016, where she created scalable systems that analyze, visualize, and provide insight into the code of thousands of programming students. Prior to entering the field of human-computer interaction, she earned her M.Eng. in the MIT CSAIL Robot Locomotion Group. She was a visiting researcher at the Stanford Biomimetics and Dextrous Manipulation Lab and a summer research intern at both Google and Microsoft Research, working on systems that help people teach and learn. Before receiving the BIDS Moore/Sloan Data Science Fellowship, she was awarded the Intel Foundation Young Scientist Award, both the NSF and NDSEG graduate fellowships, the MIT EECS Oral Master’s Thesis Presentation Award, a Best of CHI Honorable Mention, and the MIT Amar Bose Teaching Fellowship for innovation in teaching methods.

Chris Kennedy

BIDS Alum – Data Science Fellow

Chris Kennedy is an instructor in psychiatry at Harvard Medical School / Massachusetts General Hospital. He has a PhD in biostatistics from UC Berkeley. He is a senior fellow at UC Berkeley’s D-Lab and is affiliated with the Integrative Cancer Research Group and the Division of Research at Kaiser Permanente Northern California. At BIDS, he was a BIDS - Biomedical Big Data Training (BBDT) Data Science Fellow and a PhD student in biostatistics at UC Berkeley, where he worked with Alan Hubbard. He was also a D-Lab instructor and consultant, and an NIH biomedical big data trainee. His methodological interests encompassed targeted machine learning, randomized trials, causal inference, deep learning, text analysis, signal processing, and computer vision. His applications were primarily in precision medicine, public health, genomics, and election campaigns. His software projects included the SuperLearner ensemble learning system and varImpact for variable importance estimation; he leverages high performance computing on Savio and XSEDE clusters to accelerate his work. Prior to Berkeley he worked in political analytics in DC, running dozens of randomized trials and integrating machine learning into multi-million dollar programs to improve voter turnout for underrepresented Americans. He has also worked to support climate change action through Al Gore’s Climate Reality Project and the Yale Program on Climate Change Communication. He holds an M.A. in political science from UC Berkeley, an M.P.Aff. from the LBJ School of Public Affairs, and a B.A. in government & economics from The University of Texas at Austin.