Director's Vision 2024

BIDS: a space for Open Scholarship, Open Source, and interdisciplinary collaboration on AI in science and society

April 15, 2024

Note: We are hiring a new Executive Director for BIDS!

Dear Berkeley Community,

I started as Faculty Director of the Berkeley Institute for Data Science (BIDS) in January 2024; today I would like to share some ideas I have for how BIDS can be a partner to your work. In the coming weeks the Associate Faculty Director, Professor Tim Tangherlini, will offer his perspective, as we embark on this journey together.

As we enter a new era for BIDS, now within the newly-approved College of Computing, Data Science, and Society (CDSS), we have an opportunity to rethink our role. When BIDS started, the very question of what is Data Science was at the forefront. Today we have a college, a major, a minor and other entities on campus with Data Science in their names. But new questions lie ahead, and there is important space not occupied by any of these existing efforts.

Considering BIDS’ historical strengths, existing work in Data Science at Berkeley, broader changes in the scientific community around Open Science and Open Source, and many facets of the AI revolution, the following will guide BIDS’ vision in the coming years. It will be intellectually exciting, broad, and inclusive of multiple perspectives and disciplines, yet concrete enough to operationalize into programs, activities and collaborations.

First - intellectually, I see BIDS as a home for two key sets of ideas and activities, and the interplay between them:

  • Open Science, Open Source and more broadly, Open Scholarship.
  • Interdisciplinary collaborations on AI in the context of both Science and Society.

Aerial image of Berkeley campus, hills above and part of downtown. Logos of several centers, laboratories and institutes in this area are shown, with arrows pointing to their locations.

UC Berkeley's unique intellectual neighborhood, roughly a square mile in size,
encompasses numerous centers and institutes with a focus on science, computing and data.

Second - from a social/community perspective, I think BIDS can be a discipline-agnostic hub to rally the unique ecosystem that exists not only inside the University but in our immediate neighborhood, helping to build stronger connections and collaborations in the future. On campus, I have already connected with teams at centers like BIDMaP, DSE, GIF, Sky Computing, CHMI and D-Lab, and I hope to also engage with BAIR, CLIMB, the Simons Institute, and others with interests in Data Science. But not only is UC Berkeley home to departments, centers and institutes with expertise in virtually every discipline, we also have nearby an incredible range of scientific and educational entities, including Lawrence Berkeley National Laboratory, the Simons Laufer Mathematical Sciences Institute, the Space Sciences Laboratory, the Lawrence Hall of Science, the International Computer Science Institute and the SkyDeck incubator, to name only the most connected to Data Science. BIDS will build connections and shared activities around questions in data science, open scholarship and AI with as many of these as possible.

By framing our activities under this vision, BIDS will augment, complement and support the mission of many units on campus, and it will be a net positive for Data Science not only in CDSS, but for the entire Berkeley community. Next I provide some details, but this is a program I hope to develop in collaboration with many of you - I don’t have all the answers, but we can build exciting new initiatives together!

Open Science, Open Source, and Open Scholarship

Berkeley has played a key role in many developments in the history of Open Source and Open Science. From leadership in (or outright creation of) major Open Source efforts from Unix to Spark, Scientific Python or Jupyter, to founding initiatives to change publishing such as the Public Library of Science, to our libraries leading the fight for open access to the scientific literature, and more. While some of these efforts date back to the 1970s, they were not always seen as central to the mission of science; this is now changing. The White House Office of Science and Technology Policy (OSTP) declared 2023 as the Year of Open Science - today most federal agencies share policies and guidelines with clear support for Open Science practices, with data sharing requirements and funding for open source software development, for example, available at open.science.gov.

While many scientists support these practices and ideas, most also have many concrete questions, as varied as “how do I build (or fund!) better open source software for my discipline”, “how do I share data sustainably past grant funding periods”, “how do I get career credit for my data curation/sharing work”, “what are the ethical/security/other implications of sharing data in my field with this specific constraint/sensitivity”, and many more. I envision BIDS as a place where those interested in these issues can come together, and we aim to develop multiple activities on this front. We’ll be announcing an exciting new initiative in Open Source soon, and we look forward to working with all those on campus interested in this topic. We also hope to connect with our Libraries (did you know, for example, that Berkeley has a dedicated Open Science Librarian, Sam Teplitzky?), who have worked for a long time in areas of data sharing and open access to the literature.

Leadership in Open Platforms and core scientific open source

Last year, together with my colleague Ida Sim from UCSF and CPH, we launched an initiative to develop Open Source platforms based on Jupyter, to bring the modern data science toolkit to the health care realm, with an emphasis on the role of open standards and patient-centered privacy. I am proud that BIDS helps lead this innovative program with the potential to leverage open platforms to transform a critical industry for the better. This program continues BIDS' tradition of connecting computation and data with healthcare, building upon a prior collaboration with UCSF established by Maryam Vareth, BIDS’ leader in health sciences. It is also a natural extension of the impact Berkeley has had bringing Jupyter to the educational realm, where the Data Science Undergraduate Studies team has built Data Hubs that power dozens of courses on campus and allow us to teach at an unprecedented scale, democratizing access to data science and sharing this knowledge with the world, via the annual National Data Science Education Workshop.

These are two examples of how Open Platforms can have a major impact in entire sectors. At BIDS, we will build at further efforts in this direction; in partnership with teams like the Schmidt DSE Center and the Geospatial Innovation Facility (GIF), we are already exploring new ideas in geospatial data analysis, sharing common foundations and tools for one of the most important uses of data in today's world.

Furthermore, at BIDS we have leadership in the modern Scientific Python stack, thanks to efforts in recent years by BIDS researchers Stéfan van der Walt and Jarrod Millman. This is at the heart of the modern data science infrastructure, but is not "owned" by Berkeley - our scientists partner with an extended, distributed community of other researchers and developers to build an ecosystem that benefits all. This is how we will build much more in coming years: work grounded in the expertise of our scholars and immediately applied to our research and educational needs, but in open collaboration with partners near and far, to build access to research and education that is impactful, accessible and fair.

We will also seek new resources to inject into the important "XD" program and initiatives on open science and reproducible research that BIDS researchers have led in the past. The XDs were "cross-domain" workshops centered around a data modality, including TextXD and ImageXD, bringing together interdisciplinary experts and tool builders.

Science and Society in the AI revolution

A key theme BIDS will explore, led by Berkeley’s world-leading faculty in the natural sciences, the social sciences and the humanities, is a broad, interdisciplinary conversation that brings the perspectives of science and society to AI. Amidst all the change, fascinating research and some overblown hype around AI, there is no doubt that important work is being done that will impact all disciplines as well as society at large. I hope for BIDS to serve as a space for seminars, workshops and the seeding of research where scientific and social perspectives are central to the AI work, complementing the outstanding efforts carried on campus at places like BAIR, the Simons Institute, the CLIMB center, the Sky lab, as well as other centers and multiple academic departments.

In the past, BIDS hosted regular seminars on ML in the physical sciences; since last year, some of that community has effectively found a new home at BIDMaP, where we have been holding a mix of lectures that range from very focused on material science, to broad takes on physics, cosmology, astronomy and more. We plan to partner with BIDMaP and others on campus to host a regular seminar series on AI and Science and Society that alternates between the natural sciences and the social sciences and that will hopefully become a reference point on campus for high quality, interdisciplinary engagement on questions such as how to best represent scientific knowledge in AI systems or their implications human culture, for example.

Core Research Programs

There are ongoing initiatives on campus that BIDS also supports, each with their own leadership team, and I look forward to not only continuing this support, but also enhancing their environment as we grow a strong community on the topics above, that will have obvious synergies with these existing programs. These include the Computational Research for Equity in the Legal System Training Program (CRELS) led by David Harding and Harpreet Mangat, and a project to build a police misconduct database led by Aditya Parameswaran and David Barstow, as well as other initiatives cultivated at BIDS.


In closing, I hope this vision of supporting the creation of tools, platforms, infrastructure, and a community of knowledge and practice -- a highly interdisciplinary one, grounded in principles of openness and collaboration on data science, computation and AI -- is valuable for our campus. I look forward to working with many of you in the future; please don't be shy about reaching out by emailing me, join us for upcoming seminars, and connect with our team!

Sincerely,
Fernando Pérez, BIDS Faculty Director