TextXD 2020 — Text Analysis Across Domains
Dates: December 10-12, 2020
This year’s program was free to attend, presented virtually and open to a global audience. All scholars, practitioners, learners, and entrepreneurs, who engage with text analysis in their research were welcome and encouraged to register. Lightning Talk Abstracts were due by November 1, 2020. Lightning Talk Videos were due by November 15, 2020.
TextXD 2020 convened an interdisciplinary group of practitioners, researchers, learners, and entrepreneurs who work with text as a primary source of data, and who use computational text analysis in a wide range of disciplines. This year’s 3-day conference featured invited speakers, panel discussions, and exciting research talks spanning theory, applications, and tools. Participants were invited to engage actively, learn collaboratively, and deepen their expertise in text analysis by sharing their approaches, perspectives, and solutions, and by supporting each other in their practice.
Talks ranged from the theory of text analysis and deep learning to applied analyses and new software packages. The main conference event featured a series of online video presentations from this year’s speakers, leaders in innovation in text analysis across domains including Law and Society, Under-represented Languages in NLP, as well as Health and Environmental Issues.
PROGRAM AGENDA with links to VIDEOS
DAY 1: THURSDAY, December 10, 2020
11:00 AM - 6:00 PM Pacific
11:00 AM -- Welcome and Introductions
11:30 AM -- Session 1: Under-resourced Languages & Literature
- Building a portal for exploring social group identities and semantic domains in under-resourced languages – Krister Lindén, Research Director, Department of Digital Humanities
University of Helsinki, Finland
- Etaoin … Shrdlu … Cmfwyp? In Search of Lexical Networks and Cul-de-Sac Words – Matthew J. Lavin, Assistant Professor of Humanities Analytics, Data Analytics Program, Denison University
- LitBank: Born-Literary Natural Language Processing – David Bamman, Assistant Professor, School of Information, UC Berkeley
1:00 PM -- Session 2: NLP in the Social Sciences
- Beyond Bag-of-Words: Detecting and Using Meaningful Multi-word Expressions – Ken Benoit, Professor of Computational Social Science, Department of Methodology, London School of Economics and Political Science
- Faceted Rasch scaling and multitask deep learning for debiased, explainable, continuous measurement of hate speech – Chris Kennedy, Postdoctoral Fellow, Surgical Informatics, Harvard Medical School
- And the Rest is History – Laura Nelson, Assistant Professor of Sociology College of Social Sciences and Humanities Northeastern University
2:30 PM -- Session 3: NLP in theory and application
- Moving Away from One-Size-Fits-All Language Representations – Rada Mihalcea, Professor of Computer Science and Engineering; and Director, Michigan Artificial Intelligence Lab; University of Michigan
- Sure, transformers are cool… but have you tried rules? – Rachael Tatman, Senior Developer Advocate, Rasa
3:45 PM -- Lightning Talks Session 1
4:45 PM -- Social Hour
DAY 2: FRIDAY, December 11, 2020
11:00 - 3:45 PM Pacific
11:00 AM -- Welcome and Introductions
11:10 AM -- Session 4: Health & Life Sciences
- Leveraging multi-language word embedding to improve document searches – Nishitha Kambhaladinne, Data Science Analyst; and Ye-Jin (Jenna) Eun, Principal Data Scientist; Commercial Data Sciences and Data Management Team Janssen/Johnson & Johnson
- Interpretability in clinical text mining – Irena Spasic, Professor, Text and Data Mining, Cardiff School of Computer Science and Informatics
- Conversation summarization: a neural-symbolic linguistic pipeline architecture for medical reporting – Sjaak Brinkkemper, Professor of Software Production Department of Information and Computing Sciences Utrecht University, the Netherlands
1:00 PM -- Session 5: Real World NLP
- Designing Practical NLP Solutions – Ines Montani, Co-Founder, Explosion; and Core Developer, spaCy and Prodigy
- Creating machine learning predictors from text – Julia Silge, Data Scientist and Software Engineer RStudio PBC
- Lessons learned building a real-world customer support virtual assistant for troubleshooting technical problems – Claudiu Branzan, Machine Learning Engineering Senior Manager Applied Intelligence, Accenture
2:30 PM -- Lightning Talks Session 2
3:30 PM -- Social Hour
DAY 3: SATURDAY, December 12, 2020
10:00 - 12:00 PM Pacific
10:00 AM -- Hackathon Presentations
11:00 AM -- Closing Remarks and Award Ceremony
Thursday, December 10, at 10:00 AM through Friday, December 12, at 12:00 PM
Apply by December 5 (Applicants will then be notified if selected to attend.).
BIDS’ TextXD 2020 conference on December 10-12 will feature an all-new, 48-hour TextXD ‘Law & Society’ Hackathon event starting at 10:00 AM on Thursday, December 10. The deadline to apply for the TextXD Hackathon has been extended to December 5. The event will conclude with a celebration and awards ceremony on Saturday, December 12, at 10:00 AM, featuring a panel of distinguished judges including Janet Napolitano, Professor of Public Policy at Berkeley's Goldman School of Public Policy, former president of the University of California and former Secretary of Homeland Security under President Obama; and David Barstow, head of investigative reporting at the UC Berkeley Graduate School of Journalism, 4-time Pulitzer prize-winning journalist, and a former senior writer at The New York Times. Hackathon participants will focus on a unique collection of recent datasets that focus on police misconduct and associated public policy issues in California. Read more here: TextXD ‘Law & Society’ Hackathon to focus on police misconduct and data analysis in support of public policy (BIDS News, November 13, 2020).
TextXD 2020 Hackathon — Special Guest "Judges"
- Roxanna Marie Altholz, Clinical Professor of Law and Co-Director, International Human Rights Law Clinic, UC Berkeley
- David Barstow, Distinguished Chair in Investigative Journalism, UC Berkeley
- Sarah E. Chasins, Assistant Professor, Electrical Engineering and Computer Science, UC Berkeley
- Janet Napolitano, Professor of Public Policy, Goldman School of Public Policy, and Director, Center for Security in Politics, UC Berkeley
TextXD 2020 Program Committee
- Adam Anderson, UC Berkeley
- Alex de Siquiera, UC Berkeley
- Marsha Fenner, UC Berkeley
- Ciera Martinez, UC Berkeley
- David Mongeau, UC Berkeley
- Heather Haveman, UC Berkeley
- Maryam Vareth, UC Berkeley and UCSF
- Niek Veldhuis, UC Berkeley
Subscribe to the TextXD Mailing List for further details and updated information.
Contact: Questions may be directed to email@example.com.
Adam G. Anderson advised graduate students in the Computational Social Science Training Program, managed the Computational Social Science Forum, and helped organize BIDS's cross-domain (XD) initiatives. He was also a lecturer in Digital Humanities and Data Science, and an academic coordinator for Digital Humanities at Berkeley, where he co-authored and designed the Theory and Methods curriculum for the DigHum Minor and Certificate Program. He was also a co-coordinator for the Digital Humanities Working Group (DHWG) and the Computational Text Analysis Working Group (CTAWG), as well as the topic area lead in Network Analysis and Text Analysis at the D-Lab. His work brings together the fields of computational linguistics, archaeology and Assyriology / Sumerology to quantify the social and economic landscapes emerging during the Bronze Age in the ancient Near East. His research interests include network analysis, archival studies, geospatial mapping and language modeling (NLP). He applies these mixed methods to large datasets of ancient texts and archaeological records, in order to better understand the lives of individuals and groups within ancient societies, and to relate these findings within the context of our lives today. He holds a PhD in Near Eastern Languages and Civilizations from Harvard University, an MA (zwischenprüfung) in Assyriology from Ludwig-Maximilians University, and a BA in Linguistics from Brigham Young University.
Alex de Siqueira is a postdoctoral researcher at BIDS, working on open source algorithms for processing computed tomography (CT) 3D images. He received his MS and PhD from the State University of São Paulo, Brazil, applying image processing tools to tackle challenges in materials science and geochronology. A core developer of scikit-image, he is an open source and free software enthusiast since his first contact with Linux, in 2000, contributing to several projects and events in Latin America and Europe. Alex also worked as a postdoctoral fellow at the State University of Campinas, Brazil, and the TU Bergakademie Freiberg, Germany, where he created pytracks and wrote Octave - Your first steps on scientific programming (in Brazilian Portuguese).
Marsha Fenner is the Communications/Program Manager for the Berkeley Institute for Data Science. In this role, she works to connect researchers and data science practitioners across a wide array of academic disciplines, facilitate interdisciplinary collaboration, and implement training and education programs that engage and expand BIDS' and Berkeley’s active and diverse research community. Fenner has managed communications, training/education/outreach programs, and administrative operations for scientific programs and research initiatives at UC Berkeley and Lawrence Berkeley National Laboratory, including the Innovative Genomics Institute, the DOE Joint Genome Institute, and Berkeley Lab's Advanced Light Source. She holds an MA in philosophy and comparative religious studies, and a BA in classics, philosophy and mathematics.
Heather Haveman is a Professor of Sociology and Business at UC Berkeley. She holds a BA in history and an MBA (from the University of Toronto), and a Ph.D. in organizational behavior and industrial relations (from UC Berkeley). Following positions at Duke University's Fuqua School of Business, Cornell University's Johnson Graduate School of Management, and Columbia University's Graduate School of Business, Professor Haveman joined UC Berkeley in July 2006. Her research interests include how organizations, the fields in which they are embedded, and the careers of their members and employees evolve. Her current work involves American magazines and wineries, Chinese listed firms, and the emerging marijuana market in several US states.
BIDS Biology and Environmental Sciences Lead Ciera Martinez focuses on data intensive research projects that aim to understand how life on this planet evolves in reaction to the environment and climate – especially projects involving large and complex datasets. A long-time open science advocate, Ciera has been involved with and continues to be interested in working on training for open data, education, publishing, and software, including developing community standards for data management practices. As a 2019 Mozilla Open Science Fellow, she connected her love of data and museums and worked on projects aimed at understanding and increasing the usability of biodiversity and natural history museum data. She received her PhD in Plant Biology from UC Davis, researching the genetic mechanisms regulating plant architecture. She then went on to become a NSF Postdoctoral Fellow at UC Berkeley in the Molecular and Cellular Biology Department, studying genome evolution. She was also a BIDS postdoctoral Data Science Fellow for 3 years, working on undergraduate research practices, data science training, community development, and best practices for data science, diversity and inclusion, and computational research.
David Mongeau, now the Founding Director of the School of Data Science at the University of Texas at San Antonio, was the Executive Director of BIDS from April 2018 to June 2021. During that time, in collaboration with the Faculty Director and Faculty Council, he set strategic direction and oversaw the BIDS research, training, and outreach. He also led the institute’s industry and foundation relations and its engagement with other UC and global research institutes, all toward the overarching mission at BIDS to create and deploy data science methods, practices, and technologies to enable discovery. Previously, David co-led the data analytics institute at Ohio State; worked at Battelle, where he championed its proposal for an AI and cybersecurity company, now Covail; and worked for many years at Bell Labs – starting on the team that introduced the first C++ compiler and UNIX System V and leaving after building a global business and technology consulting practice, now part of Nokia Bell Labs Consulting. David earned his undergraduate degree at Carnegie Mellon University, and later earned a graduate degree at Rensselaer Polytechnic Institute and an MBA from Purdue University. Many of his interests lie beyond data science, embracing the humanities and arts.
Maryam Vareth leads BIDS’ data science research efforts in the Health & Life Sciences. Dr. Vareth is a Co-Director of the Innovate For Health initiative, a collaboration among UC Berkeley, UCSF, and Janssen Pharmaceutical Companies of Johnson & Johnson. As an experienced engineer, researcher, and data scientist, she applies mathematics, statistics and physics to solve unmet needs in healthcare to enhance patients’ experience during their medical journey. She is an advocate for “data-driven” medicine, and in particular for linking medical imaging data with medical diagnostics and therapeutics to extract clinically-relevant insights through the use of open research and open source practices. Dr. Vareth received her BS and MS training in Electrical Engineering and Computer Science (EECS) from UC Berkeley, where she was awarded the prestigious Regent’s and Chancellor’s Scholarship. She completed her PhD through the joint UC Berkeley-UCSF Bioengineering program as a National Science Foundation Fellow, where she was awarded the Margaret Hart Surbeck Endowed Fellowship for Interdisciplinary Research for her work on developing new techniques and algorithms for the acquisition, reconstruction and quantitative analysis of Magnetic Resonance Spectroscopy Imaging (MRSI), with the goal of improving its speed, sensitivity and specificity to improve the management of patients with brain tumors. She conducted her post-doctoral fellowship at UCSF, combining structural, physiological and metabolic imaging data from large clinical trials to quantitatively characterize heterogeneity within malignant brain tumors.
Niek Veldhuis is Professor of Assyriology (cuneiform studies) in the Department of Near Eastern Studies. He received his PhD at the Rijksuniversiteit Groningen (The Netherlands) in 1997, and came to Berkeley in 2002. His primary interests are in the intellectual history of ancient Mesopotamia (History of the Mesopotamian Lexical Tradition, 2014) and Sumerian literature (Religion, Literature and Scholarship: The Sumerian Composition Nanše and the Birds, 2004). He is director of the NEH-supported Digital Corpus of Cuneiform Lexical Texts and is a member of the international Oracc Steering Committee, providing tools and standards for digital publication of cuneiform texts to scholars worldwide. Today, his main research focus is on developing computational text analysis scripts (primarily in Jupyter Notebooks) for cuneiform datasets.
David Bamman is an assistant professor in the School of Information at UC Berkeley, where he works on applying natural language processing and machine learning to empirical questions in the humanities and social sciences. His research often involves adding linguistic structure (e.g., syntax, semantics, coreference) to statistical models of text, and focuses on improving NLP for a variety of languages and domains (such as literary text and social media). Before Berkeley, he received his PhD in the School of Computer Science at Carnegie Mellon University and was a senior researcher at the Perseus Project of Tufts University.
Sarah E. Chasins joined the UC Berkeley EECS faculty in 2020. Her lab invents usable programming tools to democratize computation, especially to empower social scientists, journalists, and other non-traditional programmers. Her research focuses on programming languages (PL) and program synthesis, with an emphasis on (i) work at the intersection of PL and human-computer interaction, and (ii) work at the intersection of PL and social good.
Chris Kennedy is an instructor in psychiatry at Harvard Medical School / Massachusetts General Hospital. He has a PhD in biostatistics from UC Berkeley. He is a senior fellow at UC Berkeley’s D-Lab and is affiliated with the Integrative Cancer Research Group and the Division of Research at Kaiser Permanente Northern California. At BIDS, he was a BIDS - Biomedical Big Data Training (BBDT) Data Science Fellow and a PhD student in biostatistics at UC Berkeley, where he worked with Alan Hubbard. He was also a D-Lab instructor and consultant, and an NIH biomedical big data trainee. His methodological interests encompassed targeted machine learning, randomized trials, causal inference, deep learning, text analysis, signal processing, and computer vision. His applications were primarily in precision medicine, public health, genomics, and election campaigns. His software projects included the SuperLearner ensemble learning system and varImpact for variable importance estimation; he leverages high performance computing on Savio and XSEDE clusters to accelerate his work. Prior to Berkeley he worked in political analytics in DC, running dozens of randomized trials and integrating machine learning into multi-million dollar programs to improve voter turnout for underrepresented Americans. He has also worked to support climate change action through Al Gore’s Climate Reality Project and the Yale Program on Climate Change Communication. He holds an M.A. in political science from UC Berkeley, an M.P.Aff. from the LBJ School of Public Affairs, and a B.A. in government & economics from The University of Texas at Austin.
Former BIDS Data Science Fellow Laura K. Nelson is an Assistant Professor of Sociology in the College of Social Sciences and Humanities at Northeastern University. Laura uses computational methods and open source tools - principally automated text analysis - to study social movements, culture, gender, institutions, and organizations. She is particularly interested in developing computational tools that can bolster the way social scientists do inductive and theory-driven research. She received her PhD in sociology from the University of California, Berkeley, and she also holds an MA from UC Berkeley and a BA from the University of Wisconsin, Madison. While at UC Berkeley, she was a postdoctoral fellow with Digital Humanities @ Berkeley, developing a course for undergraduates on computational text analysis in the humanities and social sciences.