I recently spoke at Berkeley Lab to a group of visitors that included the Senior Policy Advisor to Speaker Nancy Pelosi, Kenneth DeGraff. The purpose of the talk was to highlight the work being done with two of the five unique national user facilities here at Berkeley Lab: the Advanced Light Source (ALS) and the Molecular Foundry.
Among the several data modalities, these facilities require image processing to fully provide access to instruments that are currently pushing the frontiers of science research and collaboration around the world. The challenge is extracting information from data from the high-throughput instruments fast enough while petabytes of data backlogs yearly.
The state-of-the-art instruments maintained at these facilities create a fast-growing deluge of data that currently demands 35 petabytes of data traffic each month, therefore advances in data analysis and storage techniques are crucial to handle the ~7 exabytes of scientific data generated by these user facilities (estimated/expected by 2021). To give you an idea of the overwhelming amount of data, just one instrument alone can generate 1TB per hour – 1TB corresponds to 170 Netflix movies, and even though we have become really good at bing watching, this is too many pictures to scrutinize properly for scientific purposes. Software tools are needed urgently to automate indexing, ranking and searching of scientific data – efforts I'm currently engaged in Berkeley Lab and at BIDS.
A broader BIDS effort in image processing is the ImageXD initiative – Image Across Domains – which promotes knowledge transfer and understanding of image processing techniques across various domain areas and applications through free software and data. Applications are now open for this year’s ImageXD Conference, being held at BIDS on September 11-13, 2019. This year’s conference will feature global representatives in image processing and cutting edge research in fields and applications as wide ranging as new microscopy instruments to new algorithms to process multiresolution data.
Together with UC Berkeley, LBL Computing Sciences has built an international network of scientists which constructs tools for the common good. For instance, together we are building free Python software packages for image analysis, which provide building blocks for engineering sophisticated frameworks that automate scientific experiments in material sciences and biology. Ideally our efforts to improve the world of image processing technology will help save lives and improve the well-being of humanity and Earth’s environment.
BIDS Director and 2011 Nobel Laureate Saul Perlmutter has been leading BIDS since its creation in 2014, with support from the Moore and Sloan Foundations and UC Berkeley. I am really grateful for the opportunity to be a data scientist at BIDS since then, working on democratizing scientific methods and software for impactful data science, such as exploring applications of new machine learning methods for social welfare and public health — e.g. my work with the Center for Recognition and Inspection of Cells (CRIC), which endeavors to improve cervical cancer analysis and diagnosis from Pap smears images.