Proselint: The Linting of Science Prose, and the Science of Prose Linting

SciPy 2016


July 11, 2016
2:00pm to 3:00pm
Austin, TX

Writing is notoriously hard, even for the best writers, and it's not for lack of good advice — a tremendous amount of knowledge is strewn across usage guides, dictionaries, technical manuals, essays, pamphlets, websites, and the hearts and minds of great authors and editors. But this knowledge is trapped, waiting to be extracted and transformed.

We built Proselint, a Python-based linter for prose. Proselint identifies violations of expert style and usage guidelines. Proselint is open-source software released under the BSD license and works with Python 2 and 3. It runs as a command-line utility or editor plugin (e.g., Sublime Text, Atom, Vim, Emacs) and outputs advice in standard formats (e.g., JSON). Though in its infancy – perhaps 2% of what it could be – Proselint already includes modules addressing: redundancy, jargon, illogic, clichés, sexism, misspelling, inconsistency, misuse of symbols, malapropisms, oxymorons, security gaffes, hedging, apologizing, pretension.

Proselint can be seen as both a language tool for scientists and a tool for language science. On the one hand, it includes modules that promote clear and consistent prose in science writing. On the other, it measures language usage and explores the factors relevant to creating a useful linter.


M Pacer

BIDS Alum - Postdoctoral Scholar

M Pacer was a computational cognitive scientist working as a core developer on the Jupyter Project.  Her work focused on developing mechanisms for integrating computational narratives (e.g., Jupyter notebooks) into the scientific publishing pipeline, with long range goals to make a data set appropriate for scientific language processing, which required joint inference on natural language (as scientific prose in which connections to previous work and theory are usually established), mathematical language (as equations, formalisms, and theorems that precisely express theoretical relations), and programming language (in the explicit or implicit computations that connect data to theories).