BIDS Open Software


Project Jupyter is a community of open-source developers, scientists, educators, and data scientists. Its goal is to build open-source tools and create a community that facilitates scientific research, reproducible and open workflows, education, computational narratives, and data analytics. Jupyter supports over 100 programming languages, and connects data analytics tools across a range of disciplines and communities.

There are several core projects of Jupyter that the Berkeley Institute for Data Science supports:


Scikit-image is a community-driven Python project, consisting of a vast collection of high-quality, peer-reviewed image processing algorithms that are made available to a global community of researchers free of charge and free of restriction. The library is widely used in many different fields, including astronomy, biomedical imaging, and environmental resource management. Scikit-image was founded by BIDS Research Data Scientist Stéfan van der Walt in 2009.


Mothra analyzes images of butterflies and measures their wing lengths. Using binarization techniques and calculating the resolution of ruler ticks, we read in images of butterflies and output the millimeter lengths of their wings.

The pipeline script combines four modules to analyze an image: ruler detection, binarization, tracing, and final measurement. These modules are located in /butterfly . Python module requirements are listed in requirements.txt .

US Research Software Sustainability Institute

This project is conceptualizing a US Research Software Sustainability Institute that will focus on the entire research software ecosystem — including the people who create, maintain, and use research software — to validate and address various classes of concerns impacting all software development and maintenance projects across all of NSF. The proposed long-term goals of the institute could include:


SkyPortal is a fully open-source data portal for the collaborative study and management of time-domain sources and events. It interactively displays astronomical datasets for annotation, analysis, and discovery, and is designed to be modular and extensible, so it can be customized for various scientific use-cases.

Cesium ML

Cesium-ML is an end-to-end machine learning platform for time-series, that computes machine learning features, builds models, and does prediction. Cesium has two main components—a Python library, and a web application platform that allows interactive exploration of machine learning pipelines. The Cesium library is specifically designed to handle irregularly sampled time series, as is common in astronomy.


NumPy is the fundamental array package underpinning the Scientific Python ecosystem. BIDS hosts a team of four core developers that work with the NumPy community to develop the library in preparation for the next decade of data science.

NumPy contains, among other things, the following: