Project Archive




The AstroPy Project is a community effort to develop a single core package for astronomy in Python and foster interoperability between Python astronomy packages. The core package has nearly 100 contributors to date and has become one of the most widely used pieces of software in astronomy. The... more

Berkeley Carpentries Club Banner Logo

Berkeley Carpentries Club

BIDS' Education & Training Working Group has launched the Berkeley Carpentries Club to connect instructors of research computing workshops in the Berkeley, California, area. The group is a community network of teachers and instructors with diverse interests and backgrounds,... more

Berkeley Ecoinformatics Engine - logo banner

Berkeley Ecoinformatics Engine

Predicting biodiversity responses to global environmental change is a huge challenge that requires a holistic understanding of the complex interactions and feedbacks among organisms, climate, and their physical and biotic environments across space and time. Holos: Berkeley Ecoinformatics Engine... more

BIDS Machine Shop

BIDS Senior Research Data Scientist Stéfan van der Walt previously hosted these undergraduate projects through BIDS Undergraduate Internships Program. As more scientific fields move to intersect with computation, a need arises for software tools that can bridge the gap... more

CRIC background logo

Center for Recognition and Inspection of Cells (CRIC)

CRIC - the Center for Recognition and Inspection of Cells - uses massive databases of Pap smear images for the analysis and pre-screening of cervical cells. Our team is dedicated to research and development of software tools and organized cell image catalogues through automated cell... more

Cesium ML

Cesium-ML is an end-to-end machine learning platform for time-series, that computes machine learning features, builds models, and does prediction. Cesium has two main components—a Python library, and a web application platform that allows interactive exploration of machine learning pipelines. The... more

cellular structure image

Cryptography of the unknown regions of genomes

BIDS Biodiversity and Environmental Sciences Lead Ciera Martinez originally launched as part of UC Berkeley's Undergraduate Research Apprentice Program (URAP).  As sequencing genomes becomes faster and cheaper, more sophisticated informatics tools are needed to... more

Data Science Discovery Program

The Data Science Discovery Program (formerly known as the BIDS Collaborative) provides undergraduates with opportunities to engage in hands-on, team-based research opportunities by connecting them with cutting-edge data science research projects, community impact groups,... more

Deciding Force

The Deciding Force (DF) project is classifying information from more than 8,000 news articles describing more than 35,000 events in which police and the Occupy movement interacted. These data are considered alongside variables describing the governing contexts in which protests occur (including... more

This is a gamma-ray burst GRB160530a as measured with the COSI telescope  during its 2016 balloon flight.

Enabling future gamma-ray space missions

The next generation of gamma-ray space telescopes aims to open a new window into gamma-ray astronomy with unprecedented angular resolution, energy resolution, and sensitivity. The goals of these new telescopes range from achieving a better understanding of the element formation in our Galaxy, to... more

Environment and Society: Data Sciences for the 21st Century (DS421)

Environment and Society: Data Science for the 21st Century (DS421) was an interdisciplinary graduate training program and National Science Foundation Research Traineeship at UC Berkeley at the interface of data, social, and natural sciences. The project was led by BIDS Senior Fellow... more

Garbage In, Garbage Out - project page banner

Garbage In, Garbage Out? Do Machine Learning Research Papers Report Where Training Data Comes From?

Supervised machine learning is widely used across fields, but major issues are arising around biased, inaccurate, and incomplete training data. In this project, we investigate to what extent published machine learning application papers give specific details about the training data they used,... more

GraphXD banner logo

GraphXD - Graphs Across Domains

GraphXD connects scientists, researchers, and theorists interested in graphs from a variety of fields. Inspired by TextXD and ImageXD, GraphXD is organized by Berkeley Institute for Data Science (BIDS) fellows and is open to the campus community.

Hydrologic forecasting for the East River, CO

River flow forecasting is essential for planning reservoir operations, defense strategies against flooding, and fluvial ecosystems management plans. However, flow forecasting is a highly uncertain science. One of the biggest uncertainties lies in resolving the timescales over which water is stored... more

Julia logo


BIDS Senior Fellow David Anthoff led this project, which was offered through the UC Berkeley's Undergraduate Research Apprentice Program (URAP) for the Spring 2020 and Fall 2019 academic semesters.  Participants in this project worked on the core data science software stack... more

Nimble logo

NIMBLE: Numerical Inference for Hierarchical Models Using Bayesian and Likelihood Estimation

NIMBLE is a system for building and sharing analysis methods for statistical models, especially for hierarchical models and computationally intensive methods. NIMBLE is built in R but compiles your models and algorithms using C++ for speed.  It includes three components: A system for... more

SNAP Benefit Adequacy

SNAP benefits (or food stamps) are set at the national level without regard to local food prices. SNAP Benefit Adequacy researchers seek to assess how much this matters for recipient health. The maximum SNAP benefit for a given household size is fixed across the country with the exception of... more

Software Carpentry

Software Carpentry is a volunteer organization whose goal is to make scientists more productive and their work more reliable by teaching them basic computing skills. Founded in 1998, it runs short, intensive workshops that cover program design, version control, testing, and task... more

Text Thresher

Text Thresher improves the social science practice of content analysis, making it vastly more transparent and scalable to hundreds of thousands of documents. Text Thresher is a web-interface operating in citizen science and crowd working environments like CrowdCrafting. The interface allows... more

Visible and Invisible Work of Maintaining Open Source Software - project page banner

The Visible and Invisible Work of Maintaining Open Source Software

Open-source software (OSS) project are now widely used across academia, industry, government, and non-profits, but many of these projects are heavily or extensively developed and maintained by volunteers. In this project, we investigate the visible and invisible work that project maintainers do to... more