preXD Text Analysis Workshop
Date: November 29, 2017, 1:00 to 6:00 PM
Location: Academic Innovation Studio (Dwinelle 117, Level D), UC Berkeley
This year, the TextXD Conference started with a preXD Workshop on November 29 from 1:00-6:00 PM, in order to give a quick introduction to text analysis in Python using Jupyter Notebooks. This session was specifically designed to bring people to the text-analysis-starting-line so that everyone would be ready for the make sessions over the next two days. No prior text analysis experience was needed to attend the preXD. Those without Python familiarity were invited to check out some introductory materials from UC Berkeley’s D-Lab here: https://github.com/dlab-berkeley/python-fundamentals - click the “launch binder” black and red badge to run it all in your browser.
TextXD - Text Analysis Across Domains Fall 2017 Conference
Dates: November 30 and December 1, 2017, 10:00 AM to 4:30 PM
Location: 190 Doe Library, UC Berkeley
This semester's TextXD event was the biggest TextXD event to date. With so much going on in the world of natural language processing, TextXD opened to researchers beyond the UC Berkeley campus. The agenda consisted of short morning talks on new tools, methods, software, and data (see videos in the Agenda, listed below). Speakers came from our own campus as well as UC San Francisco, UC San Diego, UC Santa Barbara, Princeton, and Drexel. Afternoon "make" sessions were also introduced this year so that participants could roll up their sleeves and spend time working together to craft solutions to our shared problems or to investigate research questions of shared interest. The text data used in the "make" sessions included newspaper articles, twitter feeds, emails, congressional hearings, and journal article abstracts.
AGENDA / VIDEOS
THURSDAY, NOVEMBER 30 at BIDS (190 Doe Library)
- 10:00-10:10 — Nick Adams (BIDS): Welcome
- 10:10-10:35 — John Mohr (UCSB): The Frontiers of Social Scientific Text Analysis
- 10:35-10:45 — Cody Hennesy (UCB Library): Text Analysis on 14 Million Digital Library Books
- 10:45-11:15 — Julia Silge (StackOverflow): Text Mining with Tidy Data Principles and Count-based Methods
- 11:15-11:30 — Pramit Choudhary (DataScience): Explainable NLP Algorithms: Understanding Word Relevance in Text Datasets
- 11:30-11:40 — Elena Glassman (BIDS): Wavelets for Text
- 11:40-12:00 — Jamie Murdoch(UCB EECS): Beyond Word Importance: Contextual Decomposition for Interpreting LSTMs
- 12:00-12:05 — Devin Cornell (UCSB): Word Embedding and Semantic Analysis of News Data
- 12:05-12:25 — Make Session Previews
- 12:25-13:30 — Lunch
- 13:00-13:30 — Lunch Chat Panel — The Frontier of NLP (at Berkeley and Beyond)
- 13:30-17:00 — Make Session
- 17:00-19:00 - Happy Hour — Tap Haus (2516 Durant Ave)
FRIDAY, DECEMBER 1 at BIDS (190 Doe Library)
- 10:00-10:05 — Alex Paxton (BIDS): Welcome Back!
- 10:05-10:20 — Claudia von Vacano (D-Lab): Scalable Detection of Online Hate Speech
- 10:20-10:50 — Jake Ryland Williams (Drexel): Minimal Semantic Units in Text Analysis
- 10:50-11:05 — Han Zhang (Princeton): Uncovering Authoritarian Rule: Identifying Collective Action with Social Media Data
- 11:05-11:35 — Rex Douglass (UCSD): Georeferencing of Events from Text
- 11:35-11:55 — Nick Adams (BIDS): TextThresher: Qualitative Text Analysis at a Quantitative Scale
- 11:55-12:05 — Oksana Gologorskaya (UCSF): Text Analysis in Biomedical Applications at UCSF
- 12:05-12:20 — Miriam Petruck (ICSI): The FrameNet Database -- FrameNet: The Tip of the Iceberg
- 12:20-12:35 — Meredith Lee (West Big Data Innovation Hub / UC Berkeley): Collaborating with the Big Data Innovation Hubs
- 12:35-13:30 — Lunch
- 13:00-13:30 — Lunch Chat Panel — Humans In the Loop: The Role of Humans in Text Analysis
- 13:30-17:00 — Make Session
- 16:30-17:00 — Reports & wrap up - Conference Closing and Remarks from Participants
Below is a presentation of the complete set of videos from the 2017 TextXD Conference, held at the Berkeley Institute for Data Science (BIDS) on November 29 through December 1, 2017.
THURSDAY, NOVEMBER 30 at BIDS (190 Doe Library)
John Mohr, University of California, Santa Barbara
The Frontiers of Social Scientific Text Analysis
Cody Hennesy, University of California, Berkeley
Text Analysis on 14 Million Digital Library Books
Julia Silge, StackOverflow
Text Mining with Tidy Data Principles and Count-based Methods
Pramit Choudhary, DataScience
Explainable NLP Algorithms: Understanding Word Relevance in Text Datasets
Elena Glassman, BIDS, University of California, Berkeley
Wavelets for Text
11:40 AM-12:00 PM
Jamie Murdoch, University of California, Berkeley
Beyond Word Importance: Contextual Decomposition for Interpreting LSTMs
Devin Cornell, University of California, Santa Barbara
Word Embedding and Semantic Analysis of News Data
Host: Marla Stuart, University of California, Berkeley
Introduction and Make Session Previews
Lunch Chat Panel — The Frontier of NLP (at Berkeley and Beyond)
1:30-5:00 PM— Make Session
FRIDAY, DECEMBER 1 at BIDS (190 Doe Library)
Claudia von Vacano, D-Lab, University of California, Berkeley
Scalable Detection of Online Hate Speech
Jake Ryland Williams, Drexel University
Minimal Semantic Units in Text Analysis
Han Zhang, Princeton University
Uncovering Authoritarian Rule: Identifying Collective Action with Social Media Data
Rex Douglass (UCSD)
Georeferencing of Events from Text
Nick Adams, BIDS, University of California, Berkeley
TextThresher: Qualitative Text Analysis at a Quantitative Scale
11:55 AM-12:05 PM
Oksana Gologorskaya, University of California, San Francisco
Text Analysis in Biomedical Applications at UCSF
Miriam Petruck, International Computer Science Institute
The FrameNet Database -- FrameNet: The Tip of the Iceberg
Meredith Lee, West Big Data Innovation Hub and University of California, Berkeley
Collaborating with the Big Data Innovation Hubs
Lunch Chat Panel — Humans In the Loop: The Role of Humans in Text Analysis
Conference Closing and Remarks from Participants
Organizing Committee: Nick Adams, Marla Stuart, Alex Paxton, Chris Hench, Elena Glassman
Follow us on Twitter: #TextXD17
Nick Adams, PhD, was a full-time research fellow at the Berkeley Institute for Data Science (BIDS). He is a sociologist, and his substantive work analyzes protester and police interactions as revealed through 8,000 news accounts of nearly 200 US Occupy campaigns. His TextThresher software provides the human-powered machinery to process these data in high quantity with high quality. A builder of research communities across UC Berkeley's campus, Nick founded and leads the Computational Text Analysis Working Group at Berkeley’s D-Lab and BIDS' Text Across Domains (Text XD) initiative. He also serves on the Social Science Research Council’s Committee on Digital Culture and is a contributing editor to Mobilizing Ideas, the online journal of social movements research.
At UC Berkeley, Marla Stuart was a BIDS Data Science Fellow working with the Guizhou Berkeley Big Data Innovation Research Center (GBIC), a research hub based in Guizhou Province, China, dedicated to improving the health and well-being of China’s population. Her work with the GBIC focused on developing actionable programmatic and policy recommendations for consideration by government agencies. She led the GBIC computational lab, which collected, wrangled and modeled data from government bureaus and other sources to support the research goals of agency partners and GBIC faculty. Her own research concentrated on understanding the applicability of data science approaches in social welfare research and practice settings.
Previously, Marla had spent twenty years conducting practice-based research in public and private organizations that provide health and human services in vulnerable communities. This included fifteen years with the Navajo Nation in Arizona, where she worked with local communities to develop health and social services evaluation approaches derived from traditional Navajo philosophy and values.
Marla earned her Masters of Social Work from the University of Washington in Seattle with a focus on planned social change. She received her PhD from the School of Social Welfare at Berkeley. Her dissertation explored government efforts to scale the use of evidence-based services. It used public government records and crowd-sourced and computational data-extraction methods to create measures of these strategies. It assessed the relative effects of these public strategies on scaling progress using time-to-event analysis. It found that county governments are well positioned to implement scaling strategies and that the proportion of social service providers adopting evidence-informed services can be increased as can the proportion of county funding directed to these organizations. This study design is highly replicable and as such provides a general model to apply to other local environments to identify common county levers that effectively promote the scaling of evidence-informed social services.
Alexandra is a BIDS data science fellow and a postdoctoral scholar working with Tom Griffiths in the Institute of Cognitive and Brain Sciences. She got her PhD in cognitive and information sciences from the University of California, Merced, in December 2015.
Her work explores human communication in data-rich environments. From capitalizing on large-scale real-world corpora to capturing multimodal experimental data, her research seeks to understand how context changes communication dynamics. Broadly, her work integrates computational and social perspectives to understand interpersonal interaction as a nonlinear dynamical system.
Relatedly, Alexandra also develops research methods to facilitate quantitative research on interaction and encourages others to use data-rich computational methods through teaching and service. As part of that effort, she works with the Center for Data on the Mind to foster the application of big data to questions about cognition and behavior.
Christopher Hench was a BIDS Data Science Fellow and a PhD Candidate in German Literature and Medieval Studies at UC Berkeley from 2017 to 2018. He studied computational approaches to the formal analysis of lyric and epic poetry, and reading soundscapes. More broadly, with a particular interest in the challenges of domain adaptation for NLP and algorithms for the detection and scoring of text reuse. Christopher was also the Program Development Lead for Digital Humanities at Berkeley and the D-Lab at Berkeley, where he collaborated in several research projects and taught Python and Git workshops. He also coordinated the modules development effort in cooperation with BIDS, D-Lab, and the Data Science Education Program DSEP.
Elena Glassman was an EECS postdoctoral researcher at the Berkeley Institute of Design, advised by Bjoern Hartmann. She earned her EECS PhD at MIT CSAIL in August 2016, where she created scalable systems that analyze, visualize, and provide insight into the code of thousands of programming students. Prior to entering the field of human-computer interaction, she earned her M.Eng. in the MIT CSAIL Robot Locomotion Group. She was a visiting researcher at the Stanford Biomimetics and Dextrous Manipulation Lab and a summer research intern at both Google and Microsoft Research, working on systems that help people teach and learn. Before receiving the BIDS Moore/Sloan Data Science Fellowship, she was awarded the Intel Foundation Young Scientist Award, both the NSF and NDSEG graduate fellowships, the MIT EECS Oral Master’s Thesis Presentation Award, a Best of CHI Honorable Mention, and the MIT Amar Bose Teaching Fellowship for innovation in teaching methods.
Chris Kennedy was a BIDS - Biomedical Big Data Training (BBDT) Data Science Fellow and a PhD student in biostatistics at UC Berkeley, where he worked with Alan Hubbard. He was also a D-Lab instructor and consultant, and an NIH biomedical big data trainee. His methodological interests encompassed targeted machine learning, randomized trials, causal inference, deep learning, text analysis, signal processing, and computer vision. His applications were primarily in precision medicine, public health, genomics, and election campaigns. His software projects included the SuperLearner ensemble learning system and varImpact for variable importance estimation; he leverages high performance computing on Savio and XSEDE clusters to accelerate his work. Prior to Berkeley he worked in political analytics in DC, running dozens of randomized trials and integrating machine learning into multi-million dollar programs to improve voter turnout for underrepresented Americans. He has also worked to support climate change action through Al Gore’s Climate Reality Project and the Yale Program on Climate Change Communication. He holds an M.A. in political science from UC Berkeley, an M.P.Aff. from the LBJ School of Public Affairs, and a B.A. in government & economics from The University of Texas at Austin.