Documentation is the nexus of the open source world. It is the bridge between the concepts in the minds of users and the computational machinery that carries out those concepts. Good documentation is crucial to helping new users, to building a community of practice around packages, and to defining best-practices in a field. Bad documentation wastes the time of users, forces developers to spend more time clearing up confusion instead of building new features, and hinders a project’s ability to be impactful. However, documentation is often left behind as the open source world focuses more effort on tools and code. Documentation may be less well-defined and poorly quantifiable. It requires a different skillset than the one needed for writing quality code. Its role in the open source community is not well understood. At the Berkeley Institute for Data Science (BIDS), we’re interested in improving open source practices in the scientific and academic community. Since documentation is a core part of these practices, we were interested in gaining better understanding of the role that documentation plays in the open source community. We thus investigated the perception that developers had around the creation and curation of documentation in the open source world. We created a short survey that asked respondents to describe their views around documentation in their workflow, as well as their beliefs about several other practices in software development. We distributed the survey to scientific python developers at the SciPy 2017 conference. The list of anonymized participant responses can be found at this github repository (doi: 10.6084/m9.figshare.5557801).
Here’s a quick rundown of the kinds of activities each participant had performed in their work:
Writing documentation fell in the middle in terms of both credit and enjoyment, and reviewing documentation came close to last in both categories. This suggests that we’re not putting enough time into incentivizing or appreciating documentation.
We decided to dig into this question a bit further to see if this lack of enthusiasm corresponds to less development time on documentation. We asked respondents to tell us the percentage of their time that they usually spend on documentation in their projects. Next, we asked what percentage of time they thought they should spend on documentation. The results are below:
For each respondent we show their actual (white dot) and desired (black dot) percent of working time spent on documentation. If we calculate the difference between each individual’s “desired” (should) and “actual” (usual) time spent on documentation, we observe the following:
Blue bars represent subjects that spend less time on documentation than they think they should. The plot above shows that there seems to be a systematic difference between the time people think they should spend on documentation, and how much time they actually spend on documentation. (Note that two of the red vertical bars reflect the responses of two Docathon organizers, who are perhaps more inclined than most to value documentation). We quantified this further by looking at a histogram of the differences between these two values for each person.
In the histogram above, negative (red) values mean that respondents thought they should spend more time on documentation, positive (blue) values mean they thought they should spend less. The horizontal blue line above shows a 95% confidence interval for the mean of this distribution. The data suggests that on average respondents believe they should spend roughly 20% more time on documentation.
While the data above are a coarse measure of time for each person, they suggest that there may be a systematic under-representation of documentation in the workflows of the open source community, and that this may be tied to lower perceived appreciation and enjoyment in writing documentation. In the coming months, we hope to explore why this discrepancy might exist. Is it because of a difference in values, or skills? Are there ways that we could increase the likelihood that documentation is well-maintained (or created in the first place)? We’ll explore these ideas in future work.