Liberating Archives: Opening up Public Data for Open Research

Liberating Archives, formerly known as CapitolQuery, is a project converting the static, difficult-to-research Congressional Record into a well-organized, easy-to-query database linking floor speeches and committee hearing minutes to data describing Members of Congress and their constituencies. With the partnership of the Social Science Research Council, we will be enabling a wave of social science research revealing how Congressional Members’ activities in Congress are shaped, or not, by the districts they represent, their own identities, parties, and more – and the interactions of all these through time. An open resource, researchers and the public will be able to ask and answer new questions revealing gender dynamics in Congress Members’ interactions, shifting legislative priorities, the interplay between electoral and governing incentives, the influence of war on domestic policy, the difference between what political actors say and do… and much more.

Students on this Data Science Discovery Project will work to gather, parse, and publicly share digital archives that have previously been inaccessible for research purposes. Participants will build on previous semesters' data liberation efforts, which included scraping the web for document files while retaining document metadata; programmatically finding and extracting meaningful data objects within the documents; linking those objects to external databases; preparing all this compiled textual data for computational analysis in R and Python; and hosting the newly formed database so that the public and other researchers can launch their own studies of the data. This semester, participants will work together to finish what others started, organizing and improving these liberated archives with additional tags, improved querying functionality, and high-value visualizations. Students will finish the semester with a showcase event offering the archives to researchers and the public. Prerequisites: previous participation on the Liberating Archives project or advanced flask and/or visualization skills. 

BIDS and Goodly Labs are offering this undergraduate research project for the Fall 2019 semester. Eligible undergraduate students may register via UC Berkeley's Undergraduate Research Apprentice Program (URAP). Eligible undergraduates may apply online by September 3, 2019.

