Data Science Coast To Coast — Data equity and open science
Date: Wednesday, April 21, 2021
Time: 12:00–1:00 PM Pacific
The Data Science Coast to Coast (DS C2C) seminar series is hosted jointly by seven academic data science institutes — BIDS, NYU’s Center for Data Science, Rice University’s Ken Kennedy Institute, Stanford Data Science, the University of Michigan’s Michigan Institute for Data Science (MIDAS), and the University of Washington’s eScience Institute, and Johns Hopkins University's Institute of Data Intensive Engineering and Science (IDIES) — to provide a unique opportunity to foster a broad-reaching data science community. In the first half of 2021, DS C2C will host five seminars, each featuring one faculty member and one postdoctoral fellow from two universities. Each speaker will give a 20-minute talk about ongoing projects and motivating issues, followed by 20 minutes of discussion with the audience. These seminars will be the launching point for follow-on research discussion meetings that will hopefully lead to fruitful collaborative research.
Data Equity: A Core Requirement for Responsible Data Science
H. V. Jagadish, Director of Michigan Institute for Data Science and Professor of Computer Science and Engineering, University of Michigan
It was only recently that we regularly used to hear statements like “Let the data speak for themselves”. Today, we instead hear worries about fairness of data-driven systems and AI. Nevertheless, a focus on a specific formulation of fairness in one data science step is far too narrow to be the whole story. We need to address inequitable representation in the data record, inequities due to the data scientist’s world view being reflected in the model, inequities in the resulting outcomes, and inequities in access to fruits of the analysis. In this talk, I will lay out a research agenda in this direction, and invite you to join me.
Open Science as a Community of Practice
Ciera Martinez, Biodiversity and Environmental Sciences Lead, Berkeley Institute for Data Science (BIDS), University of California, Berkeley
The academic research system is not built to incentivize open science practices, but transparency and reproducible methodology allows researchers to critically assess and build upon results to fuel scientific discovery and supports a more collaborative and equitable research community. Open science and data practices are often presented as ideals, but rarely do we train for how to handle the intricacies that emerge from every unique research project life cycle. In this talk I will present the ERP (Explore, Refine, and Produce) workflow – a three-phase data analysis workflow that guides researchers to create reproducible and responsible data analysis workflows. Each phase is centered on how to make decisions based on the audience the research is communicated, the research products created, and the career aspirations of the researchers involved. We hope this work helps create a community of practice for how we design and train for reproducible data intensive research and helps demystify data analysis for both students new to research and current researchers who are new to data-intensive work.
All events in the series are free to attend, and all who are interested are welcome and encouraged to participate. Questions may be directed to Jing Liu (firstname.lastname@example.org), Managing Director of MIDAS.
BIDS Biology and Environmental Sciences Lead Ciera Martinez focuses on data intensive research projects that aim to understand how life on this planet evolves in reaction to the environment and climate – especially projects involving large and complex datasets. A long-time open science advocate, Ciera has been involved with and continues to be interested in working on training for open data, education, publishing, and software, including developing community standards for data management practices. As a 2019 Mozilla Open Science Fellow, she connected her love of data and museums and worked on projects aimed at understanding and increasing the usability of biodiversity and natural history museum data. She received her PhD in Plant Biology from UC Davis, researching the genetic mechanisms regulating plant architecture. She then went on to become a NSF Postdoctoral Fellow at UC Berkeley in the Molecular and Cellular Biology Department, studying genome evolution. She was also a BIDS postdoctoral Data Science Fellow for 3 years, working on undergraduate research practices, data science training, community development, and best practices for data science, diversity and inclusion, and computational research.