Common Sense Approaches to Sharing Tabular Data Alongside Publication

Nicholas J. Tierney, Karthik Ram

Patterns
December 10, 2021

The bigger picture: Without data, there is no science. Science needs to be reproducible so we can trust the results and progress as a field. Although academia generally appreciates the benefits of data sharing, research data are usually only available to the original researchers. If data are shared, they often lack the documentation to make them easy to reuse. There is not a strong culture of data sharing, and we believe this is due to a lack of incentives and infrastructure for making data useful. While past papers focus on general practices for sharing data and code, we focus on a specific audience: academics working in data science adjacent fields who are about to submit for publication. We provide immediately usable guidelines to share tabular data alongside research that a researcher could pick up and use today. In the future, we hope the culture around data sharing will change and be rewarded in science. A new normal is needed where data are submitted with every research publication, and datasets are easily discovered, shared, and extended in other analyses.

Summary: Numerous arguments strongly support the practice of open science, which offers several societal and individual benefits. For individual researchers, sharing research artifacts such as data can increase trust and transparency, improve the reproducibility of one's own work, and catalyze new collaborations. Despite a general appreciation of the benefits of data sharing, research data are often only available to the original investigators. For data that are shared, lack of useful metadata and documentation make them challenging to reuse. In this paper, we argue that a lack of incentives and infrastructure for making data useful is the biggest barrier to creating a culture of widespread data sharing. We compare data with code, examine computational environments in the context of their ability to facilitate the reproducibility of research, provide some practical guidance on how one can improve the chances of their data being reusable, and partially bridge the incentive gap. While previous papers have focused on describing ideal best practices for data and code, we focus on common-sense ideas for sharing tabular data for a target audience of academics working in data science adjacent fields who are about to submit for publication.

Ram - Patterns - 2021 - Mechanism-Incentives Diagram - 1-s2.0-S2666389921002300-gr1_lrg

"The mechanisms for behavior change, the incentives, and our assessment of where the elements of data, code, and computational environment rank in terms of completing these aspects. We note that data are often required, but the preceding steps are not, in contrast to code, which has no policy."



Featured Fellows

Karthik Ram

Senior Research Data Scientist