One step in the development process for all software is the release. Some projects continually release, while others demarcate certain states of the codebase in a more formal, numbered way. Scientists leading the development of research software may be interested in how I manage my own software releases and encapsulate some reproducibility best practices along the way. A quick Google search turns up a plethora of "software release checklists." This blog post summarizes mine.
One of my projects, PyRK, a python tool for reactor kinetics, recently had a release. To be as transparent, robust, and citable as possible, the PyRK project employs a release procedure that relies on version control (Git), a ticketing system (GitHub), automated documentation (Sphinx), a website (Read the Docs), a test framework (nose), continuous integration (Travis CI), and an archival system that generates digital object identifiers (DOIs) (figshare). At the heart of all of this is a software-development process inspired by GitFlow, which incorporates testing and code review. I won’t cover that workflow in this blog post; I’ll instead focus on the process in the image, a typical scientific software-release procedure. While the particular software stack I use in this example is tailored to python projects, the concepts are common across programming languages.
Maintaining clear documentation and a transparent development process, especially in the context of a release, allows other scientists to discover and use our scientific software. PyRK is a python project, so we use Sphinx and its api-doc utility to automatically document the classes and functions that make up the source code. This requires that we add well-formatted docstrings to our classes and functions. Sphinx does the rest. That generated documentation is then hosted within a website built with Read the Docs. To help us in the development process, we collect all proposed changes to the codebase in “issues” and tag those issues into “milestones” using GitHub. When all “issues” in the “release v0.2” milestone are complete, we’re then ready to release the new version of the code. The closed issues in the milestone captures the changes represented by the release, so users can review the changes represented by the release.
Much of the work of code robustness happens during the development process but aids in the release process. That is, during development, we constantly add unit tests to ensure that code changes achieve the behaviors we expect. Those unit tests are run in the nose testing framework and tested across supported platforms using Travis CI. As part of the release procedure, it is important to double check that the test coverage is sufficient and tests are passing.
Scientific software often contains scientific models and algorithms, the use of which should be cited by users when they contribute to scientific work. Citing code has historically been tricky, but some services now allow code to receive a DOI, which is archived and can be cited like a journal article is cited. figshare allows me to create a citeable DOI from the release tag on GitHub. I also include this information in a citation file in the software repository and on the software website.
My checklist, summarized in the image, goes like this:
- Address open issues associated with the release milestone (use a ticketing system like GitHub's)
- Confirm the closed issues in the milestone capture the changes represented by the release
- Confirm all tests pass (with a test framework like nose)
- Confirm builds and tests pass on all supported platforms (use continuous integration like travis)
- Update documentation (automated with a tool like Sphinx)
- Update other website information, like new authors, release notes, etc. (Read the Docs)
- Tag the revision of the code with a release tag (Git/GitHub).
- Upload it to an archive that generates DOIs (figshare)
- Update the citation file to include the release citation from figshare (CITATION.md)
A subtle issue not covered in the above list is that of “release candidates.” In projects with more than a few users, each release turns up a small flurry of bugs encountered by the “tire kicking” of its users. To quickly fix those bugs and include them in the release, many teams put out a test release, wait for the bug reports, fix those bugs, and then proceed with the real release. The process for iteration on releases and release candidates isn’t as necessary for small projects, but is something to be aware of as your project grows.
So, in conclusion:
- For transparency, we use a version-control tool, a ticketing system, an automated documentation tool, and a website.
- For robustness, we employ unit tests, continuous integration, and code review in a controlled way.
- For citability: we have a citation file and use a service like figshare to acquire a DOI.