Computational Social Science Forum — A university map of course knowledge

CSS Training Program

October 26, 2020
12:00pm to 1:30pm
Virtual Participation

Register

Computational Social Science Forum
Date: Monday, October 26, 2020
Time: 12:00-1:30 PM Pacific Time
Location: Register to receive the schedule and access links.

A university map of course knowledge  

Speaker: Zachary Pardos, Associate Professor, Graduate School of Education, UC Berkeley

Abstract: Knowledge representation has gained in relevance as data from the ubiquitous digitization of behaviors amass and academia and industry seek methods to understand and reason about the information they encode. Success in this pursuit has emerged with data from natural language, where skip-grams and other linear connectionist models of distributed representation have surfaced scrutable relational structures which have also served as artifacts of anthropological interest. Natural language is, however, only a fraction of the big data deluge. Here we show that latent semantic structure can be informed by behavioral data and that domain knowledge can be extracted from this structure through visualization and a novel mapping of the text descriptions of elements onto this behaviorally informed representation. In this study, we use the course enrollment histories of 124,000 students at a public university to learn vector representations of its courses. From these course selection informed representations, a notable 88% of course attribute information was recovered, as well as 40% of course relationships constructed from prior domain knowledge and evaluated by analogy (e.g., Math 1B is to Honors Math 1B as Physics 7B is to Honors Physics 7B). To aid in interpretation of the learned structure, we create a semantic interpolation, translating course vectors to a bag-of-words of their respective catalog descriptions via regression. We find that representations learned from enrollment histories resolved courses to a level of semantic fidelity exceeding that of their catalog descriptions, revealing nuanced content differences between similar courses, as well as accurately describing departments the dataset had no course descriptions for. We end with a discussion of the possible mechanisms by which this semantic structure may be informed and implications for the nascent research and practice of data science. Paper: https://doi.org/10.1371/journal.pone.0233207 [viz]

The Computational Social Science Forum is an informal setting for the interdisciplinary exchange of ideas and scholarship at the intersection of social science and data science. Weekly meetings are hosted by researchers from BIDS and D-Lab, and participants engage in a variety of activities such as presentations of work in progress, discussions and critiques of recent papers, introductions to new tools and methods, discussions around ethics, fairness, inequality, and responsible conduct of research, as well as professional development. We welcome social scientists researchers with interests in data science methods and tools, and data scientists with applications or interests in public policy, social, behavioral, and health sciences. Participants include graduate students, postdocs, staff, and faculty, and members are encouraged to attend regularly in order to foster community around improving computational social science research, supporting the development and research of group members, and fostering new collaborations. This Forum is organized as part of the Computational Social Science Training Program. Meetings are currently held virtually on Mondays at 12:00-1:30 PM Pacific Time, and interested UC Berkeley community members are invited to use this registration form to receive the schedule and access links. Please contact css-t32@berkeley.edu for more information.

Speaker(s)

Zachary Pardos

Associate Professor, Graduate School of Education, UC Berkeley

Zachary Pardos, an Associate Professor at UC Berkeley in the Graduate School of Education, studies adaptive learning and AI. His research focuses on knowledge representation and recommender system approaches to using behavioral and semantic data to map out paths to cognitive and career achievement in K-16. He earned his PhD in Computer Science at Worcester Polytechnic Institute with a dissertation on computational models of cognitive mastery. After completing his PhD in 2012, he spent one year as a Postdoctoral Associate at the Massachusetts Institute of Technology applying adaptive learning paradigms to online learning. At UC Berkeley, he directs the Computational Approaches to Human Learning research lab, teaches in the Graduate School of Education and the Division of Computing, Data Science, and Society, and is an affiliated faculty in Cognitive Science.