This talk recounts five years of experience building NLP tools for investigative journalism. Things look a lot different when you move from the lab to the newsroom! Standard entity recognition breaks on journalists' input material, topic modeling turns out to be nearly useless because of interpretability issues, and mere OCR is a much bigger problem than it should be. Even the most potentially effective algorithms are not production tools and cannot be used by reporters. This is why I've adopted the mantra of "platforms, not tools." The Computational Journalism Workbench is an end to end system for no-coding-required reproducible data analysis, with a plug-in architecture that allows researchers to deploy advanced techniques. Although it's being built for journalism, the Workbench offers important lessons for getting data science applied to other disciplines.
Presented at the Berkeley Institute for Data Science on Thursday, November 9, 2017. (Note: Due to an equipment error, the first few seconds of this video are missing. Our apologies for the inconvenience.)