TextXD brings together researchers from across a wide range of disciplines, who work with text as a primary source of data. We work to identify common principles, algorithms and tools to advance text-intensive research, and break down the boundaries between domains, to foster exchange and new collaborations among like-minded researchers. Talks will range from the theory of NLP and deep learning to applied analyses or new software packages.
2018 TextXD Symposium
December 5-7, 2018
190 Doe Library, UC Berkeley
REGISTER by November 25 - Registration is free, but pre-registration is required due to limited seating. Registrants will be accepted via email confirmation on a space-available basis.
DRAFT AGENDA - This page is being updated as more speakers are confirmed.
- Wednesday, Dec. 5th : Training workshops - "Intro to Text Analysis"
- Thursday, Dec. 6th: Invited talks, discussions, and applied collaboration sessions
- Friday, Dec. 7th: Invited talks, discussions, applied collaboration sessions
- Laurent El Ghaoui, UC Berkeley Electrical Engineering and Computer Sciences (EECS)
- Niek Veldhuis, UC Berkeley Near Eastern Studies
Niek Veldhuis' keynote presentation on "Sumerian Word Embeddings" (Photo: J. Dugan)
- Nick Adams, Goodly Labs
- Adam Anderson, UC Berkeley Near Eastern Studies
- AJ Alvero, Stanford Education
- Alina Arseniev-Koehler, UCLA Sociology
- Geoff Bacon, UC Berkeley Linguistics
- Devin Cornell, UCSB Sociology
- Milena Gianfrancesco, UCSF
- Suzanne Tamang, Stanford
- Jaren Haber, UC Berkeley Sociology
- Chris Hench, Amazon
- Caroline Le Pennec, UC Berkeley Economics
- Laura Nelson, Northeastern U. Sociology
- Alex Paxton, U. Connecticut
- Tanya Roosta & Emmanuel Vallod, SumUp Analytics
- Abigail See, Stanford NLP
- Jae Ho Sohn, UCSF Radiology
- Deborah Sunter, Tufts
- Rochelle Terman, U. Chicago Political Science
- Manoj Tiwari, Google
Contact: Chris Kennedy (email@example.com)
This event is hosted by the Berkeley Institute for Data Science (BIDS), and co-sponsored by the D-Lab, UCSF Bakar Computational Health Sciences Institute, and the UC Berkeley School of Information.
Niek Veldhuis is Professor of Assyriology (cuneiform studies) in the Department of Near Eastern Studies. He received his PhD at the Rijksuniversiteit Groningen (The Netherlands) in 1997 and came to Berkeley in 2002. His primary interests are in the intellectual history of ancient Mesopotamia (History of the Mesopotamian Lexical Tradition, 2014) and Sumerian literature (Religion, Literature and Scholarship: The Sumerian Composition Nanše and the Birds, 2004).
He is director of the NEH-supported Digital Corpus of Cuneiform Lexical Texts (http://oracc.org/dcclt) and is a member of the international Oracc Steering Committee, providing tools and standards for digital publication of cuneiform texts to scholars worldwide.
Today, his main research focus is on developing computational text analysis scripts (primarily in Jupyter Notebooks) for cuneiform datasets.
Nick Adams, PhD, was a full-time research fellow at the Berkeley Institute for Data Science (BIDS). He is a sociologist, and his substantive work analyzes protester and police interactions as revealed through 8,000 news accounts of nearly 200 US Occupy campaigns. His TextThresher software provides the human-powered machinery to process these data in high quantity with high quality. A builder of research communities across UC Berkeley's campus, Nick founded and leads the Computational Text Analysis Working Group at Berkeley’s D-Lab and BIDS' Text Across Domains (Text XD) initiative. He also serves on the Social Science Research Council’s Committee on Digital Culture and is a contributing editor to Mobilizing Ideas, the online journal of social movements research.
Christopher Hench was a BIDS Data Science Fellow and a PhD Candidate in German Literature and Medieval Studies at UC Berkeley from 2017 to 2018. He studied computational approaches to the formal analysis of lyric and epic poetry, and reading soundscapes. More broadly, with a particular interest in the challenges of domain adaptation for NLP and algorithms for the detection and scoring of text reuse. Christopher was also the Program Development Lead for Digital Humanities at Berkeley and the D-Lab at Berkeley, where he collaborated in several research projects and taught Python and Git workshops. He also coordinated the modules development effort in cooperation with BIDS, D-Lab, and the Data Science Education Program DSEP.
Laura received her PhD in sociology from the University of California, Berkeley. She has an MA from UC Berkeley and a BA from the University of Wisconsin, Madison. She was a postdoctoral fellow at Digital Humanities @ Berkeley and BIDS Data Science Fellow, developing a course for undergraduates on computational text analysis in the humanities and social sciences. Laura is currently an Assistant Professor of Sociology and Anthropology at the Northeastern University.
Laura uses computational methods and open source tools, principally automated text analysis, to study social movements, culture, gender, institutions, and organizations. She is particularly interested in developing computational tools that can bolster the way social scientists do inductive and theory-driven research.
When she's not being an academic, Laura loves to travel, hike, and run. She also plays the violin and is always looking for ways to keep that part of her life.
Alexandra is a BIDS data science fellow and a postdoctoral scholar working with Tom Griffiths in the Institute of Cognitive and Brain Sciences. She got her PhD in cognitive and information sciences from the University of California, Merced, in December 2015.
Her work explores human communication in data-rich environments. From capitalizing on large-scale real-world corpora to capturing multimodal experimental data, her research seeks to understand how context changes communication dynamics. Broadly, her work integrates computational and social perspectives to understand interpersonal interaction as a nonlinear dynamical system.
Relatedly, Alexandra also develops research methods to facilitate quantitative research on interaction and encourages others to use data-rich computational methods through teaching and service. As part of that effort, she works with the Center for Data on the Mind to foster the application of big data to questions about cognition and behavior.
Deborah Sunter received a B.S in Mechanical and Aerospace Engineering at Cornell University. There she developed a nanosatellite mission that was successfully launched into orbit. Although fascinated by aerospace applications, the time-critical issue of global warming shifted her focus in graduate school to explore renewable energy. Specializing in computational modeling of thermo-physics in multiphase systems, she developed a novel solar absorber tube and received her Ph.D. in Mechanical Engineering at the University of California, Berkeley. The need for a global environmental solution led her to do research abroad in both Japan and China. After receiving her doctorate, she advanced her understanding of energy policy as an AAAS Science and Technology Policy Fellow at the U.S. Department of Energy. She now is a postdoctoral fellow in the Renewable and Appropriate Energy Laboratory at the University of California, Berkeley. Her research interests include data science for sustainability, national energy planning, city-integrated renewable energy systems, environmental justice, and clean technology innovation.
Heather A. Haveman is currently involved in two data-science projects:
Marijuana. Working with PhD student Cyrus Dioun, I am studying the emerging marijuana market. Every week, we scrape data for medical and recreational marijuana providers from commercial aggregators. We will assess how shifts in regulatory regimes at the local, state, and national level affect the growth of this market and the proliferation of novel products, and we will study spillovers across regulatory boundaries. We are also scraping data on user-identified effects of different marijuana strains from commercial aggregators. We will use topic modelling to identify themes. Combining topic data with other data on marijuana providers will allow us to map how marijuana is perceived by different users (e.g., recreational versus medical users). We will also be able to chart both the positive and negative effects of various strains of marijuana.
Wine. With Cyrus Dioun, I am studying the US wine market. We are gathering wine ratings and tasting notes from several sites. Our goal is to assess how relationships between reviews (tasting scores) and product attributes (words and phrases describing wine) vary across types of wine, by price, by location and how they co-evolve over time. We will begin by counting the number of times certain words and phrases appear in each review, analyzing three aspects of cultural vocabulary: words and phrases denoting particular varietals, descriptions of specific tastes and smells, and general evaluation. Lists in all three categories will come from several wine-tasting guides. We will regress wine ratings on the number and presence of particular vocabulary items and will assess the contingent effects of wine type, price, and location.
Maryam Vareth is an enthusiastic engineer who is passionate in applying mathematics and physics to solve unmet needs in healthcare and life sciences. She is an advocate for “data-driven” medicine and keen on meaningfully extracting clinically relevant insights from large-scale medical data, more specifically to directly link medical imaging data to medical diagnostics and therapeutics and moving her community towards open source and open research practices.
She received her PhD, MS and BS training from UC Berkeley. Her doctoral work focused mainly in developing new techniques and algorithms for acquisition, reconstruction and quantitative analysis of Magnetic Resonance Spectroscopy Imaging (MRSI) with the goal of improving the speed, sensitivity and specificity of the data obtained for the management of patients with brain tumor. Her post-doctoral research was the continuation of her PhD work with emphasis in using a combination of structural, physiological, and metabolic imaging data from large clinical trials to quantitatively characterize heterogeneity within malignant brain tumors.
As an associate specialist, her research will be focusing in exploring the role of machine learning and other big data approaches in extracting contributors to brain tumor and developing tools for predicting pathologic & molecular characteristics of tumor aggressiveness using multi-parametric MRI in patients with Glioma with ultimate goal of developing a completely data-driven model that is able to extract imaging features and use them to identify risk factors and predict outcomes and translate this knowledge into the clinic.
Chris is a PhD student in biostatistics where he works with Alan Hubbard, and is an independent data science consultant. He is also a D-Lab instructor and consultant, and an NIH biomedical big data trainee.
His methodological interests encompass targeted machine learning, randomized trials, causal inference, deep learning, text analysis, signal processing, and computer vision. His applications are primarily in precision medicine, public health, genomics, and election campaigns. His software projects include the SuperLearner ensemble learning system and varImpact for variable importance estimation; he leverages high performance computing on Savio and XSEDE clusters to accelerate his work.
Prior to Berkeley he worked in political analytics in DC, running dozens of randomized trials and integrating machine learning into multi-million dollar programs to improve voter turnout for underrepresented Americans. He has also worked to support climate change action through Al Gore’s Climate Reality Project and the Yale Program on Climate Change Communication.
Jonathan Dugan, Chief Research Officer, has focused his career on the promotion of science, education and open culture. His work in both nonprofit and for-profit businesses includes consulting, business management, entrepreneurship, web services and software development, community engagement, and biomedical informatics systems.
Jonathan works with the Berkeley research community to empower faculty and researchers to develop missing areas of the data science environment (tools, methods, systems, workflows, etc.), and secure the resources to fund them. To accomplish this, he directs our efforts to solve research issues facing the emerging field of data science. He helps us promote our successes, fund our work, and find practical solutions that bring together the best faculty, postdocs, students, and staff to solve immediate challenges for our research and education efforts.
Jonathan completed his PhD from Stanford in 2002 in biomedical informatics, where he developed nonlinear mathematical simulations for protein structure modeling. His current research interests include citations, data sharing, software development, community engagement, identity and reputation systems, and applying machine learning techniques to solve research questions.