Tales from the Docathon: How to Get Communities to Write Documentation

April 28, 2017

by Chris Holdgraf and Nelle Varoquaux

How did you first learn to analyze data, build software, or contribute back to the open source community? Chances are, you relied heavily on documentation from the many projects in the open source community. Documentation is one of the most important components of the open science ecosystem—and it's everywhere! From examples that provide inspiration for things you can do with a package to a tutorial that teaches you the concepts and ideas behind what a project is trying to do, documentation helps provide an entry point and a guide for open source projects.

Unfortunately, documentation is also often under-developed and under-appreciated, which is where the Docathon comes in.

The Docathon is a one-week long coding sprint that aims to motivate open source developers to spend less time building new features and squashing bugs and more time creating examples, tutorials, API descriptions, and guides to help their users learn more about their projects.

This year’s Docathon was held the week of March 6 and was organized by a group of researchers at BIDS with key support from members of the eScience Institute at the University of Washington and the Graduate Center at the City University of New York. This post tells the story behind the Docathon and the week of activity, its structure, who participated, and what worked well. We hope it inspires others to host their own community-driven Docathons!

The Docathon Begins!

The Docathon was a hybrid remote/local event focused on providing flexibility for people to contribute from wherever they were but also providing some glue to keep everyone on the same page. At Berkeley, the week began with a morning of tutorials covering best practices and tools that enable developers to create good documentation. These tutorials were live-streamed so that remote participants could take part as well. We had a great turnout of enthusiastic people looking to make their documentation a little bit better! We covered topics like how to organize documentation on a website, how to automatically generate beautiful galleries of examples, and how to generate and automatically host documentation using GitHub. The tutorials are available on our Youtube channel!

Next, we began the real fun. The Docathon is all about celebrating the joy and value of building beautiful documentation, so we wanted a fun way to motivate people throughout the week. We had users sign up either as individuals or as projects so that we could keep track of what they were up to during the week.

Users from all over the world signed up, representing a collection of both new and established developers in the open source community:

We kept track of everybody’s commit activity during this week as well and asked participants to tag any documentation-related commits with “DOCATHON” or “DOC.” This tagging allowed us to create a leaderboard to add a little competitive friendly spirit to the week! It also let us keep track of how much activity occurred as a result of the Docathon. Here’s an example of how the week went:

As you can see, we saw a flurry of commits representing improvements to documentation across the open source community. Some projects chose to improve the wording and structure of their examples, while others updated their docstrings to adhere with the “numpydoc” structure. Some projects even used the week to entirely revamp their websites.

Docstars

The biggest docstar of the week was the open source python project for visualizing brain activity, pycortex. This is a complex and relatively new package that lets you visualize the brain interactively from within a web browser. The pycortex team’s original documentation was built on a vanilla version of readthedocs, but by the end of the week, they had a beautiful revamped website that utilized the sphinx-gallery plugin to generate galleries of examples.

Other projects used the week as an opportunity to give their documentation some much-needed love. For example, DIY is a C++ library that helps implement data-parallel algorithms that run both distributed across many nodes and out of core. Over time, its documentation fell out of sync with the code. The project authors used the week to update both code snippets and text to reflect the current state of the library. DIY also followed one of the week’s tutorials and now automatically deploys its documentation to GitHub, making it much easier to keep up to date.

Finally, we had some well-established packages participate as well. matplotlib has a large and constantly growing website full of glorious examples, tutorials, and explanations. They used the week as an opportunity to rearrange some of these examples and to recreate them in a nicely laid out sphinx gallery. This sets the foundation for even more beautiful narrative-based examples in the future.

What's Next

We’re excited for the next official Docathon week, which will be held sometime in early 2018. In the meantime, we’d love to see other projects and/or teams host their own mini-Docathons if they’d like to motivate their team to improve their documentation.

Next time, we hope to get even more projects in the open source community involved and will provide more tools to help them get assistance from willing contributors to improve their documentation and build a stronger developer community. We’d love to see the Docathon continue to grow as documentation is one thing that binds all open source projects together no matter what language they are built with.

Until then, keep documenting!

The Docathon Team

Acknowledgements

We wouldn’t have been able to put the Docathon together without a ton of great help from sponsors across the open science domain. Firstly, we want to thank BIDS for hosting the planning team in the months leading up to the event. In addition, we received support from BIDS, the University of Washington, and the Graduate Center at the City University of New York  to host Docathon working groups. We also would like to thank our funders: the Gordon and Betty Moore Foundation, the Alfred P. Sloan Foundation, and Continuum Analytics.

Thanks to Gina Helfrich and Ali Fergusion for feedback.