The Virtuous Cycle of a Data Ecosystem.

Bradley Voytek

doi:10.1371/journal.pcbi.1005037

Abstract

Digital data of all types are being created at an ever-increasing rate, doubling approximately every two years. Annual data creation rates are estimated to reach 44 trillion gigabytes by 2020 [1]. Similarly, the rate at which primary scientific data are being collected is accelerating [2]. This astounding growth in scientific data creation has led to the contemporary discussion of scientific data sharing policies. Many of the criticisms levied against data sharing have focused on practical issues such as the economics and logistics of data storage, technical challenges for doing so, or appropriate attribution of credit [2–9]. In contrast, the arguments in favor of data sharing have focused largely on scientific replication, reproducibility [10], facilitation of collaborative research, and increased citations for publications that share data [11]. This is largely an ethical argument wherein there is an obligation to share data collected using public funds [3–6,12,13]. Rather than focusing on the much-discussed arguments against data sharing—cost, infrastructure, curation, privacy, and attribution/credit concerns—in this Perspective, I outline the overlooked benefits of data sharing: novel remixing and combining as well as bias minimization and meta-analysis. I argue that we must consider the weight of the costs against the true value of the possible benefits. If the decision for any individual researcher, university, or funding agency to implement data sharing policies comes down to a cost—benefit analysis based solely on replication versus storage, the cost—benefit analysis may be artificially tipped in favor of not sharing data caused by overlooking more subtle—but critical—benefits. These hidden benefits of data remixing cannot be appreciated when considering each individual dataset as an independent entity, and thus a richer consideration of those benefits is warranted. Although there is some evidence that, on the local scale, research groups may not make use of shared data [14], in this Perspective, I outline the ways in which research groups are beginning to take advantage of open data in novel, and sometimes surprising, ways. Rather than arguing for a centralized, large-scale data repository, I am advocating for a more organic development wherein we, institutionally, encourage the growth of a data ecosystem. This can be done via multiple venues, such as the general scientific data sharing sites figshare (https://figshare.com/) or the Dryad Digital Repository (http://datadryad.org/), each of which, in addition to Nature Publishing Group’s recently launched peer-reviewed data sharing journal, Scientific Data [15], provides citable Digital Object Identifiers for the data themselves. Such developments are addressing concerns regarding credit and help motivate data curation and contextualization. A data sharing ecosystem provides space for multiple diverse datasets to intermingle to encourage new, multidisciplinary discoveries for current and future scientists.

Highlights

Digital data of all types are being created at an ever-increasing rate, doubling approximately every two years
To give but a few examples of this: studying monkey social behaviors and eating habits led to insights into the origins of HIV [52]; research into how algae move toward light paved the way for optogenetics—using light to control neural activity [53]; and black hole research spurred the development of algorithms eventually used as part of the 802.11 specifications ubiquitously used in modern Wi-Fi [54]
It is our duty to preserve our data so that future generations will not be hindered by our prejudiced interpretations and analytical limitations

Summary

Bradley Voytek*

Modern science is creating data at an unprecedented rate, yet most of these data are being discarded. Raw scientific data, when they are published at all, are provided in a very limited form. Multidimensional datasets—rich with hidden information—are reduced to summary statistics filtered through limitations imposed by contemporary methods and technologies, and through the biased lens of the originating research group. The massive loss of raw data currently underway, and the lack of a system for discovering them, hinders scientific progress. In this Perspective, I argue that our contemporary limited view of the long-term scientific and medical benefits that could be made possible by data sharing masks the benefits for doing so.

OPEN ACCESS

Introduction

Data remixing and combining

Findings

Conclusion

Full Text

Published Version (Free)

View/Download pdf

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: PLOS Computational Biology	Publication Date: Aug 4, 2016
Citations: 22	License type: CC BY 4.0

R Discovery Prime

The Virtuous Cycle of a Data Ecosystem.

Abstract

Highlights

Summary

Published Version (Free)

Talk to us

Similar Papers

More From: PLOS Computational Biology

Lead the way for us

Similar Papers

Abstract 920: Insights from the National Cancer Institute’s request for information on existing data sharing processes for NIH-funded research
Mousumi Ghosh ... Nathan Boyd
Cancer Research | VOL. 84
Mousumi Ghosh, et. al.Mousumi Ghosh ... Nathan Boyd
22 Mar 2024
Cancer Research | VOL. 84

Investigating the Roles and Responsibilities of Institutional Signing Officials After Data Sharing Policy Reform for Federally Funded Research in the United States: National Survey.
Jinyoung Baek ... Jonathan Lawson
JMIR formative research | VOL. 8
Jinyoung Baek, et. al.Jinyoung Baek ... Jonathan Lawson
20 Mar 2024
JMIR formative research | VOL. 8

The Safe and Effective Use of Shared Data Underpinned by Stakeholder Engagement and Evaluation Practice.
Farah Magrabi ... Philip Scott
Yearbook of medical informatics | VOL. 27
Farah Magrabi, et. al.Farah Magrabi ... Philip Scott
22 Apr 2018
Yearbook of medical informatics | VOL. 27

Availability and Use of Shared Data From Cardiometabolic Clinical Trials.
Muthiah Vaduganathan ... Ann Marie Navar
Circulation | VOL. 137
Muthiah Vaduganathan, et. al.Muthiah Vaduganathan ... Ann Marie Navar
13 Nov 2017
Circulation | VOL. 137

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

The Virtuous Cycle of a Data Ecosystem.

Abstract

Highlights

Summary

Published Version (Free)

Talk to us

Similar Papers

More From: PLOS Computational Biology