January 20, 2021 (Download PDF – 1.1 MB)
Berkeley Institute for Data Science
University of California, Berkeley
190 Doe Library
Berkeley, CA 94720
For seven years, the Berkeley Institute for Data Science (BIDS) has been transforming the process of data-intensive discovery and the institutional environment in which scientific discovery takes place. BIDS has advanced discovery by creating new data science tools, methods, communities, and career paths. And BIDS has changed the institutional environment at the University of California, Berkeley, by serving as the fulcrum for its academic community and its collaborators to conduct impactful cross-disciplinary research and training.
The Gordon and Betty Moore Foundation and the Alfred P. Sloan Foundation made all of this possible beginning in 2013 through their Moore Sloan Data Science Environment (MSDSE) grants. BIDS remains ever grateful. We are proud of our history with the Foundations, the University of Washington, and New York University in the Data Science Environment Partnership. We look forward to our future working together through the Academic Data Science Alliance.
As BIDS enters its eighth year, we – the Directors, Faculty Council, and Staff – believe that the Institute has demonstrated what the Foundations expected: how an institution-wide commitment to data science can deliver dramatic gains in scientific productivity and lead to significant new discoveries and institutional change (Source: msdse.org/about). BIDS has indeed matured into an integral and self-sustaining research institute at UC Berkeley. We offer this annual report with much gratitude.
Table of Contents
— Campus Organization Structure
— Campus Fundraising Campaign
— Campus Strategic Plan and Long-term Development Plan
Responses to the Coronavirus Pandemic and Calls for Justice
— BIDS COVID-19 Response
— BIDS Social Justice and Inclusion Response
Achievements and Progress
— Grants and Gifts
— Projects and Publications
— Faculty, Fellows, and Staff
— Data Science Community Building and Outreach
In this, our final annual report to the Foundations, we describe the institutional environment in which BIDS will continue to lead and thrive. We then highlight our efforts to bring data science research, talent, and thought leadership to bear on the defining events of 2020 – the global coronavirus pandemic and the national calls for racial and social justice. These highlights include citing the increased diversity of our fellowship programs and the researchers participating in them. Subsequently, we present some notable impacts and achievements that BIDS faculty, staff, and collaborators made in 2020. Hyperlinks are provided throughout the report for additional information.
Campus Organization Structure
BIDS is an integral part of the newest academic unit at UC Berkeley, as announced in February:
The Division of Computing, Data Science, and Society (CDSS) is the centerpiece of a novel, dynamic structure that enables Berkeley faculty and students to work across disciplinary boundaries to explore the foundations and applications of computing, data science, and information and their implications for society. It includes the Data Science Education Program, the School of Information, and the Berkeley Institute for Data Science, and involves the Departments of Statistics and Electrical Engineering and Computer Sciences. The Division also will include the Data Science Commons, an entity designed to advance outstanding new transdisciplinary programs.
Within CDSS, BIDS has been positioned as its primary interdisciplinary research arm and an important externally-facing unit to government and industry. Additional research units are also being incorporated into CDSS. In October, the Center for Computational Biology and the Social Sciences D-Lab joined CDSS.
The inclusion of additional data-centric but more domain-specific research units alongside of BIDS exemplifies the “interconnected networks” that were envisioned in the long-term goal set for the MSDSEs: to bring together interdisciplinary people with institutional environments in order to provide them with the resources, freedom, and interconnected networks necessary for science to flourish (Source: msdse.org/about).
Campus Fundraising Campaign
BIDS is featured prominently in media focusing on data science and artificial intelligence, and should benefit from the $6B Light the Way Campaign for Berkeley, which launched in October.
The campaign has four priorities of which “Research in the Public Good” is one. This research priority leads with “Data Science and Artificial Intelligence” to give students from all backgrounds the skills to analyze and compute data and apply them to any field and to ensure that the uses of data science and artificial intelligence are safe, fair, ethical, and compatible with human intelligence. Data science and AI also permeate the other research areas in the campaign, in particular “Energy, Climate, and the Environment.”
Lighting the Way to the Future of Data Science is a brief video that features Saul Perlmutter, BIDS Director, who refers to the role BIDS plays for the campus and in his own research career, along with an inspiring undergraduate. In forthcoming collateral about data science and AI from the UC Berkeley development office, Ciera Martinez, Biodiversity and Environmental Sciences Lead, and Alex di Siqueira, Postdoctoral Researcher, are featured.
Campus Strategic Plan and Long-term Development Plan
BIDS and much of CDSS is slated to move into a new 418,000 gross square foot building in 2025, as approved by the UC Regents in September. The approved proposal for the new data science building states, “The Berkeley Institute for Data Science, now in Doe Library, will move in and be an incubator for campus-wide research.”
The proposal aligns with the focus on data science in both the UC Berkeley Strategic Plan and the Long-Term Development Plan. Our current space at Doe Library has been an exemplar for the consultants and architects for the new building. Our new space will be designed for BIDS to help tackle the campus’s Research for the Public Good priority, particularly around innovative solutions for challenges with ethical AI, technical innovation, environmental sustainability, public health, and justice. We expect it to keep BIDS at the crossroads of data science at UC Berkeley, where we will work to achieve the vision for CDSS:
- Educate the next generation to approach data ethically and masterfully.
- Advance the state of the art in the core of computing, data science, information, and statistics.
- Found new fields bringing computing and data science together with other disciplines.
- Make real progress on issues of human and societal significance.
Responses to the Coronavirus Pandemic and Calls for Justice
BIDS COVID-19 Response
On March 11, BIDS celebrated the launch of our new strategic industry relationship with Accenture Applied Intelligence at our last in-person event on the UC Berkeley campus in 2020. Subsequently, on March 16, most of the UC Berkeley research enterprise and all instruction began to be conducted remotely because of the coronavirus pandemic.
The pandemic exacerbated the often-cited shortage of data science and data analysis expertise, which the DSE Partnership was meant to address. Consequently, people at BIDS mobilized to bring our data science research, talent, and thought leadership to bear.
We immediately started to work more closely with the School of Public Health and other parts of CDSS to determine the best and most effective course of action. We also issued a call for researchers and students to volunteer their time, capabilities and resources, so that BIDS could coordinate data science community resources across campus.
Beginning on April 3, BIDS accomplished the following:
1. Provided software development and project management expertise to Faculty Affiliate Maya Petersen for a simulation model, LEMMA (Local Epidemic Modeling for Management & Action), designed to provide regional (e.g. city or county-level) projections of COVID-19 under various scenarios. Daily projections with uncertainty bounds are made for hospitalizations, ICU use, ventilator use, active cases, and total cases. Karthik Ram, Senior Research Data Scientist, led a team to make the code for the model more robust and turn it into a fully functioning R package. The app is currently being used in regional planning for predicting hospital needs.
2. Launched the BIDS Data Science Services and Consulting portal to help match the needs of domain experts — such as, epidemiologists and clinicians — to volunteer data science experts from BIDS and across UC Berkeley.
3. Collaborated with computational epidemiology, public health, and visualization researchers at UC Berkeley and Georgia Tech to create COVIDVIS, a tool for visualizing the impact of seven pandemic interventions: school, restaurant, and business closures; border restrictions; emergency declarations; banning gatherings; and stay-at-home orders. Steph Eaneff, I4H Fellow, consolidated, structured, and documented public policy data and contributed to the data visualizations. COVIDVIS has been explored by numerous researchers, public health agencies, and the public, and a peer-reviewed publication is forthcoming.
4. Developed a consolidated data structure and user-interface built in R Shiny and hosted online that allows users to compare results across epidemiological models to understand how a COVID-19 outbreak might progress over time and across geographies. This work by Steph Eaneff, I4H Fellow, has been used as a teaching tool at Georgetown University for both undergraduate and graduate courses and was consulted by those working to support federal level public health response to COVID-19. The work is documented in GitHub.
5. Organized a series of webinars called Berkeley Conversations: COVID-19. The webinars engaged BIDS Faculty Affiliates and other UC Berkeley faculty in panel discussions around urgent topics about the pandemic. BIDS and CDSS conducted six Berkeley Conversations in total, including one that addressed the timely topic of election security during the pandemic:
- Making sense of data, social distancing and what lies ahead - April 3
- COVID-19 — Creating Informed Responses - April 7
- Understanding and Seeking Equity Amid COVID-19 - April 21
- Tracking, data privacy, and getting the numbers right - May 13
- Election integrity and Security – October 26
- Understanding and seeking truth during the pandemic – December 8
The most widely watched webinar was on April 7 with 1,764 live viewers. It was streamed over 3,395 times in 2020. As part of this webinar, BIDS Faculty Affiliates Henry Brady, Sandrine Dudoit, and Maya Petersen discussed their data-intensive research in biostatistics, epidemiology, public policy, and public health.
For the December 8 webinar, we invited BIDS Alumni Nick Adams to share information about collaborative methods and tools for citizen scientists to engage with publicly available data, including Public Editor, which he created with Saul Perlmutter and others at BIDS. Today, Public Editor is being used to assess credibility in the news. BIDS Faculty Affiliate Hany Farid and Professor Deirdre Mulligan also participated in this webinar.
6. Is currently collaborating with the Department of Veterans Affairs Health Administration (VHA) on open source software Synthea, a synthetic data generator. Haley Hunter-Zinck, I4H Fellow, is working with the VHA and Booz Allen to transform the Synthea COVID-19 dataset into a format that matches a VHA shared COVID-19 data resource.
Additionally, beginning in April, Faculty Affiliates Maya Petersen, Uroš Seljak, and Bin Yu, Research Affiliate Neil Davies, Researcher Alexandre de Siqueira, and Alumni Rebecca Barter contributed several open access papers to support the international COVID-19 research response. Four papers have since been peer-reviewed and published.
- Total COVID-19 Mortality in Italy: Excess Mortality and Age Dependence through Time-Series Analysis. April 20, 2020, medRxiv.org.
- COVID-19 pandemic reveals the peril of ignoring metadata standards. June 19, 2020, Scientific Data.
- Natural stings: alternative health services selling distrust about vaccines on YouTube. July 1, 2020, Frontiers, SocArXiv.
- Evaluation of a novel community-based COVID-19 ‘Test-to-Care’ model for low-income populations and two additional papers. October 9, 2020, Plos.
- Curating a COVID-19 data repository and forecasting county-level death counts in the United States. November 3, 2020, Harvard Data Science Review.
These papers reflect the BIDS community’s commitment to open access publishing as well as open source software, which were integral to the Data Science Environment Partnership as it developed over the past seven years.
BIDS Social Justice and Inclusion Response
In May 2020, the collective consciousness of racism in our country was raised following the killing of George Floyd. Similarly to how we responded to COVID-19, we began working with other parts of CDSS to determine what best to do, and eventually came to leverage our new strategic industry relationship with Accenture Applied Intelligence.
We recognized the immediate opportunity to act on our commitment to reach as diverse a population of candidates for two of our data science fellowships programs. Faculty Affiliates Cathryn Carson, Fernando Perez, and David Harding along with many BIDS Staff also participated in and contributed to a discussion of Anti-racist Data Science Pedagogy in early June. Later, we organized to direct our data science expertise across multiple domains and our open research software toward an ambitious proposal to promote fair policing.
Beginning on May 26, BIDS
1. Issued the call for nominations to launch our Computational Social Sciences Training Program (CSSTP) with a leading emphasis on nominations of students from underrepresented groups interested in data science in five research disciplines: Sociology, Demography, Social Epidemiology, Public Health Policy, Social Welfare, and Public Policy. The call resulted in welcoming a diverse cohort of five new graduate student fellows to BIDS in July. Contributors included Faculty Affiliates David Harding and Maya Petersen; David Mongeau, Executive Director; and Adam Anderson, Research Training Program Manager.
2. Directed the call for applications for our Innovate for Health (I4H) program, which was released in June, to sites such as DiversityinHigherEducation.com, IMDiveristy.com, DiversityWoman.com, and the WorkplaceDiversity.com Network – Veterans, Hispanic, Disability, OutandEqual. Moreover, Maryam Vareth, Health and Life Sciences Lead, spearheaded an effort to incorporate the principle of “assessing achievement relative to opportunity” within the recruitment process. Achievement relative to opportunity is a framework that supports a fair and equitable assessment of career progression and achievements over a period of time given the opportunities available to the candidates. This framework helps to ensure that the overall quality and impact of achievements is given more weight than their quantity, rate or breadth relative to personal, professional, and other circumstances. The effort was aimed toward growing diversity within our workforce. This call for applications resulted in a cohort of six experienced researcher fellows who joined BIDS in October (>50% women). Other contributors at BIDS were Faculty Affiliate Sharmila Majumdar and David Mongeau, Executive Director.
3. Responded to a request from the Innocence Project and the National Association of Criminal Defense Lawyers to BIDS Director Saul Perlmutter to help further develop a prototype of a database of police misconduct records to promote fair policing and prevent the use of excessive force across the United States.
BIDS responded by immediately consulting with NACDL, as well as the Legal Aid Society, about how to evolve the prototype into a fully open source solution. We then developed a comprehensive proposal to create a sustainable, trusted network for the multiple constituents in courtrooms, newsrooms, community organizations, and research universities to access, understand, and use the data in the misconduct records and related sources. Our solution includes the addition of data science methods and tools for
- Intelligent interfaces for tagging and feature extraction
- End user programming tools for data ingestion
- Data wrangling and integration
- Cryptographically secure analytics.
It also introduced interdisciplinary research involving faculty from African American studies, computer science, journalism, law, physics, public policy, social welfare, sociology, and statistics. The research addresses
- Administrative police data collection, analysis, and dissemination
- Qualitative interpretation and implications of police data
- Racial disparities in police deployment, stops, searches, arrests, and use and severity of force
- Disparities in prosecutorial decisions and its connection to police misconduct and/or the police misconduct database
- Theoretically and empirically address systemic forces at work in police misconduct discourses.
Since October, proposals for the Community Law Enforcement Accountability Network (CLEAN) have been under consideration by three philanthropies.
4. Responded to a request received by Executive Director David Mongeau from Accenture Applied Intelligence, a BIDS strategic industry member, to work with Upwardly Global in supporting skilled immigrants and refugees seeking to rebuild their careers in the U.S. who have an interest in data science and AI.
BIDS responded first by launching a series of webinars for Upwardly Global staff and for ~200 immigrants and refugees. The first webinar, “What are the building blocks to succeed in data science?” was held on November 18 for staff members who counsel job seekers, during which BIDS staff researchers provided an overview of data science skills, roles and career paths, and how best to perform in interviews for a variety of roles and employment positions. The second webinar was designed in December specifically for the job seekers themselves. It will provide training in data science methods and tools; the webinar will launch in February 2021. From the webinar goal statement:
“In 2020 and now in 2021, as the country battles Covid-19, we continue to hear of difficulties faced by essential industries, including healthcare, IT, and logistics, in finding workers to fight the pandemic and help our country rebuild. BIDS aims to help hundreds of Upwardly Global job seekers who have the data science and AI skills needed to answer the call to fill those shortages and address these crucial needs.”
- Reproducibility and Open Science theme, which included “identifying and promoting repositories for sharing data and workflows, developing techniques to query and analyze shared data and workflows that will facilitate reuse, and creating software tools to better support sharing and reproducibility.”
- Education and Training theme, which included “[v]iewing education in its widest sense, the aim is to make data science technology and expertise far more accessible”
- Careers theme, which purposed to “create and sustain long-term career trajectories for a new generation.”
Achievements and Progress
In this section of our annual report, we present some notable achievements and progress made in 2020 by BIDS faculty, staff, and collaborators. We also provide updates about our faculty, fellows, and staff and about our data science community building and outreach.
Grants and Gifts
BIDS received several grants and gifts in 2020, including the following:
- BIDS Strategic Industry Member. A generous gift from Accenture Applied Intelligence. A new strategic relationship to support interdisciplinary research and training that will advance the field of data science. This collaboration aims to explore major social and scientific challenges, such as ethical AI, biomedicine, and environmental sustainability in California.
- Computational Social Sciences Training Program. A five-year, $1.2 million grant from the National Institutes of Health (NIH) Office of Behavioral and Social Sciences (OBSSR) and its partner institute, the Eunice Kennedy Shriver National Institute of Child Health and Human Development. Our first NIH T32 training grant to prepare students to take advantage of advances in computing and data science that can enable their research in demography, public health, public policy, social welfare, and sociology.
- The Next Decade of Scientific Python. A three-year, $1.7M research grant from Gordon and Betty Moore Foundation. In support of developing a community-vetted decadal plan for the Python programming language for the future of data science.
Projects and Publications
Projects launched or continuing in 2020 and publications during the year included the following.
Fellows and Graduate Research
BIDS Data Science Fellows
- Ivana Malenica co-authored (with BIDS Faculty Affiliate Laura Waller and BIDS Alum Henry Pinkard) an article entitled "Learned adaptive multiphoton illumination microscopy."
- Váleri N. Vásquez published work on ”Promoting linguistic diversity in community and stakeholder engagement,” “Principles for data analysis workflows,” and “Proposed marine protected area (MPA) to mitigate encroaching human activity and climate change on Antarctic Peninsula.”
- Elleni Hailu has published research on “Neighborhood social environment and changes in leukocyte telomere length: The Multi-Ethnic Study of Atherosclerosis (MESA).”
- Ángel Mendiola Ross has published research on “New study sheds light on the ineffectiveness of addressing social inequities through police interventions.”
- Steph Eaneff published an article titled “The case for Algorithmic Stewardship for Artificial Intelligence and Machine Learning Technologies.” in JAMA calling for policies that incentivize “algorithm stewardship”, drawing parallels to existing oversight efforts in antimicrobial stewardship and medication use evaluations;
- Haley Hunter-Zinck published a software package and command line interface for Synthetic Electronic Health Records.
The 2020 Co-hort of Fellows began work on six projects to be completed over the next two years:
- Akram Bayat – Situational awareness for complex patients. Akram is developing a tool that allows physicians to launch an integrated display from within the EHR that extracts and visualizes meaningful clinical patterns from a patient's medical history, so that physicians can easily understand the patient’s overall current clinical status efficiently in terms of time and quality.
- Reza Eghbali – Neurological treatment recommendation system. Reza is working on a longitudinal MRI analysis capability for brain tumor assessment for a recommendation system that uses imaging and clinical biomarkers.
- Ben Lacar - Social determinants of health screening for informed and targeted care. Ben is creating an algorithm that extracts social factors from clinical text data and makes recommendations for adapting care towards a patient's social circumstances or directly addressing their social needs.
- Saeed Seyyedi – Digital health platform for corneal opacities and cataracts management. Saeed is developing a a mobile app or software platform using deep learning and computer vision approaches to detect and classify corneal diseases.
- Elizabeth Smith – Scaling the impact of PSMA-PET in clinical decision-making. This project involves a new imaging approach to detect whether or not prostate cancer has spread to other parts of the body.
- Mithra Vankipuram – Activity-based disease progression monitoring for patients newly diagnosed with Multiple Sclerosis. Mithra is focused on translating research-backed insights about activity-based measures for disease progression prediction into a clinical decision support tool that can be deployed in MS clinics
The BIDS Undergraduate Internship program grew in 2020. Through the internships, Faculty/Research Affiliates and Research Staff enlist undergraduate students to help design and implement collaborative projects in a variety of research areas to help students develop skills in research, programming, paper writing, project management, and team building. Projects this year included the following:
- Anomaly Detection using Deep Learning for Fundamental Physics Discovery - Ben Nachman, LBNL
- Computational imaging and microscopy - Laura Waller, EECS
- Deep Learning in Medical Imaging - Maryam Vareth, BIDS and UCSF;
- Demo Watch - Saul Perlmutter, BIDS; and Nick Adams, Goodly Labs
- Developing an open-source software package for generating synthetic electronic health record data - Haley Hunter-Zinck, BIDS and UCSF
- Medical Imaging Research: k-space MRI Reconstruction Using Deep Learning - Maryam Vareth, BIDS and UCSF
- Research Ready Archives - Saul Perlmutter, BIDS; and Nick Adams, Goodly Labs
- Public Editor - Saul Perlmutter, BIDS; and Nick Adams, Goodly Labs
- Understanding physical processes and making environmental predictions using Deep Learning - Laurel Larsen, Geography
- Using Data Science to Improve COVID-19 Response in Developing Countries - Josh Blumenstock, I-School
Faculty and Staff
- “Array programming with NumPy” (paper) by authors including the Numpy development team members currently or previously at BIDS – Stéfan van der Walt, Jarrod Millman, Sebastian Berg, Nathaniel Smith, Matti Picus, and Tyler Reddy – was published in Nature (September). The paper takes modern data scientists on a complete tour of NumPy array programming, from its origins as a small community project, to its emergence as the foundation of a vibrant ecosystem of data analysis tools that now span an increasingly broad range of research domains and applications.
- “The Case for Algorithmic Stewardship for Artificial Intelligence and Machine Learning Technologies” (paper) by I4H Fellow Stephanie Eaneff, and faculty at UC Berkeley and UCSF, was published in JAMA Network (September).
- “Natural Stings: Selling Distrust About Vaccines on Brazilian YouTube” (paper) by authors including BIDS Staff Alexandre de Siqueira, was published in Frontiers in Communication (October). The authors found that alternative health channels spread distrust about traditional institutions to promote themselves as trusted sources for the audience, profiting with alternative health services.
- “Improved Guarantees and a Multiple-Descent Curve for Column Subset Selection and the Nyström Method” (paper) by Faculty Affiliate Michael Mahoney and BIDS Alums Michał Dereziński and Rajiv Khanna of Berkeley’s Foundations of Data Analysis Institute (FODA) were awarded a “NeurIPS 2020 Best Paper Award” (December)
- “Veridical Data Science” (article) by Faculty Affiliate Bin Yu, was published in the Proceedings of the National Academy of Sciences of the United States (February). Veridical refers to something that is truthful or coincides with reality. Veridical data science presents Professor Yu’s framework for integrating predictability, computability, and stability (PCS) as three core principles of data science. She and BIDS Alum Rebecca Barter are now adapting the material into “The Elements of Data Science: A Perspective from Applied Statistics and Machine Learning” (book), which is forthcoming from MIT Press (2021).
- “Reading to Write” (article) by Debra Nolan, Associate Dean and Statistics Professor, BIDS Alum Sarah Stoudt was published in the journal of the Royal Statistical Society in Great Britain (December). The article offers a preview to their book “Communicating with Data: The Art of Writing for Data Science” (book), which is forthcoming from Oxford University Press in May 2021.
Faculty, Fellows, and Staff
BIDS undertook an effort beginning in June, under the direction of the Faculty Council, to complete a review of our faculty engagement. ;The effort was organized to improve and further diversify the engagement. It resulted in recasting a new BIDS Faculty/Research Affiliation program, which enumerated more specific opportunities and expectations than we had in the past.
In August, the Faculty Council invited existing faculty fellows to extend their appointments. The council also invited a group of new faculty and researchers to join the community for an initial term of three years, renewable indefinitely and depending on their efforts to meet the affiliation criteria. In September, we put an open invitation to all faculty who depend on data science to advance their research to join BIDS by completing a Statement of Interest. As a result, we now have 59 affiliates representing 30 departments and schools, along with LBNL and UCSF. Their BIDS Affiliations will be reviewed annually by the Faculty Council to ensure that they are mutually beneficial.
Also of note among our Faculty/Research Affiliates were the following recognitions received:
- Faculty Affiliate Rediet Abebe was featured in a Forbes article highlighting some of AI/ML's current female role models and thought leaders who have "paved the way for young females entering the workforce" (November)
- Research Affiliate Charu Varadharajan was awarded one of Berkeley Lab's 2020 Director's Awards for Exceptional Achievement in recognition of leadership in the area of data science for earth and environmental science (November).
- Research Affiliate Deb Agarwal was appointed to California Water Data Consortium’s inaugural Steering Committee (October), and received an LBL Director’s Award for “intellectual and strategic leadership in data science and technology at Berkeley Lab, and for the design and development of data systems to address critical scientific problems in support of DOE’s research missions” (November).
Fellows’ Career Transitions
Seven fellows moved on to new positions in 2020, including to Head of Data Science at a healthcare start-up; Deputy Director of Social Services for Solano County, California; data science researcher at Google; and to faculty and researcher positions at Harvard University, Smith College, and the University of California, San Diego.
Staff Additions and Departures
- Ciera Martinez accepted an appointment as Biodiversity and Environmental Sciences Research Lead at BIDS (June).
- Adam Anderson accepted a 50% appointment as Research Training Programs Manager, while retaining his position as academic coordinator and lecturer for Digital Humanities at Berkeley.
- Liliana Cardile accepted a 25% appointment as Industrial Relations Manager for BIDS, while continuing to serve the same function for CDSS more broadly.
- R. Stuart Geiger left BIDS to accept a tenure-track faculty position at University of California, San Diego. He is now Assistant Professor in the UCSD Department of Communication in the Division of Social Sciences with a joint appointment at The Halıcıoğlu Data Science Institute.
Data Science Community Building and Outreach
- As noted previously, BIDS Data Science Services and Consulting was launched in April. We defined for the first time specific areas of support that we now can provide to faculty and researchers upon request:
− Causal inference
− Ensemble models
− Graph/network analysis
− Machine learning
Research Software Methods and Tools
− Scientific Python ecosystem – including matplotlib, numpy, scipy, scikit-image, scikit-learn
− Scientific R – R, RStudio, Shiny, tidyverse
− Jupyter – Jupyter notebooks, JupyterHub, JupyterLab
− Databases – HDF5, SQL
− Continuous Integration/Continuous Deployment (CI/CD)
− Version control
These areas enhance data science services offered by other units on campus, such as Data Peer Consulting, the Social Science D-Lab, Research IT, and the University Library. In 2021, we may consider further developing these services by referring to the DS3 business models at NYU CDS.
- BIDS partnered with the Bakar Computational Health Sciences Institute at the University of California, San Francisco, to launch our new BIDS-BCHSI Research Xchange, a discussion forum for the interdisciplinary exchange of ideas and research at the intersection of healthcare and data science. Innovate for Health (I4H) Fellow Haley Hunter-Zinck presented “A consolidated framework for generating and validating synthetic, structured electronic health record data” on November 2. I4H Fellow Stephanie Eaneff presented “Algorithmic Stewardship” on December 7. The series will continue on a monthly basis into 2021.
- BIDS launched the Computational Social Sciences Forum (September) to provide an informal setting for the interdisciplinary exchange of ideas and scholarship at the intersection of social science and data science, with aims to improve computational social science research, support the development and research of new members, and foster new collaborations. Weekly meetings are hosted by researchers from BIDS and D-Lab, and participants engage in a variety of activities such as presentations of work in progress, discussions and critiques of recent papers, introductions to new tools and methods, discussions around ethics, fairness, inequality, and responsible conduct of research, as well as professional development.
- BIDS joined with five other data science institutes at Rice University, Stanford University, the University of Michigan, and our Data Science Environment Partnership colleagues at NYU and UW to establish the Data Science Coast to Coast seminars. Our goal is to integrate and extend our individual data science communities. BIDS identified and invited two of the three DS C2C speakers for the fall: Talitha Washington, Director of the Atlanta University Center Data Science Initiative, and December’s speaker Jeanne Holm, Deputy Mayor for Budget and Innovation – Chief Data Officer, City of Los Angeles. Dr. Washington presented “Why we can’t wait: Using social justice to transform data science” on October 21. Dr. Holm presented “Using data to improve equity” on December 15.
- TextXD 2020 was presented virtually with two days of talks and a concurrent ‘Law & Society’ Hackathon in December. As part of the Hackathon, participants focused on a unique collection of recent datasets that focus on police misconduct and associated public policy issues in California. The Hackathon Awards Ceremony on December 12 featured distinguished judges: Janet Napolitano, Professor of Public Policy at Berkeley's Goldman School of Public Policy and former president of the University of California and former Secretary of Homeland Security under President Obama; and David Barstow, head of investigative reporting at the UC Berkeley Graduate School of Journalism, 4-time Pulitzer prize-winning journalist, and a former senior writer at The New York Times. Read more in this feature article, TextXD ‘Law and Society’ Hackathon to focus on police misconduct and data analysis in support of public policy.
In this, our final annual report, we described the institutional environment in which BIDS will advance beyond its founding as a MSDSE. We highlighted our efforts to bring data science research, talent, and thought leadership to bear on the defining events of 2020 – the global coronavirus pandemic and the national calls for racial and social justice. We highlighted the increase in diversity among those participating in our fellowship programs. And we presented some notable achievements and other progress made in 2020.
We look forward to continuing to collaborate with the Moore and Sloan Foundations, NYU CDS, and UW eScience and the many data science organizations now associated with the Academic Data Science Alliance, where BIDS will continue to contribute substantially to making data science accessible, ethical, and purposeful for all.