Use Of Shared Data Research Articles

Digital data of all types are being created at an ever-increasing rate, doubling approximately every two years. Annual data creation rates are estimated to reach 44 trillion gigabytes by 2020 [1]. Similarly, the rate at which primary scientific data are being collected is accelerating [2]. This astounding growth in scientific data creation has led to the contemporary discussion of scientific data sharing policies. Many of the criticisms levied against data sharing have focused on practical issues such as the economics and logistics of data storage, technical challenges for doing so, or appropriate attribution of credit [2–9]. In contrast, the arguments in favor of data sharing have focused largely on scientific replication, reproducibility [10], facilitation of collaborative research, and increased citations for publications that share data [11]. This is largely an ethical argument wherein there is an obligation to share data collected using public funds [3–6,12,13]. Rather than focusing on the much-discussed arguments against data sharing—cost, infrastructure, curation, privacy, and attribution/credit concerns—in this Perspective, I outline the overlooked benefits of data sharing: novel remixing and combining as well as bias minimization and meta-analysis. I argue that we must consider the weight of the costs against the true value of the possible benefits. If the decision for any individual researcher, university, or funding agency to implement data sharing policies comes down to a cost—benefit analysis based solely on replication versus storage, the cost—benefit analysis may be artificially tipped in favor of not sharing data caused by overlooking more subtle—but critical—benefits. These hidden benefits of data remixing cannot be appreciated when considering each individual dataset as an independent entity, and thus a richer consideration of those benefits is warranted. Although there is some evidence that, on the local scale, research groups may not make use of shared data [14], in this Perspective, I outline the ways in which research groups are beginning to take advantage of open data in novel, and sometimes surprising, ways. Rather than arguing for a centralized, large-scale data repository, I am advocating for a more organic development wherein we, institutionally, encourage the growth of a data ecosystem. This can be done via multiple venues, such as the general scientific data sharing sites figshare (https://figshare.com/) or the Dryad Digital Repository (http://datadryad.org/), each of which, in addition to Nature Publishing Group’s recently launched peer-reviewed data sharing journal, Scientific Data [15], provides citable Digital Object Identifiers for the data themselves. Such developments are addressing concerns regarding credit and help motivate data curation and contextualization. A data sharing ecosystem provides space for multiple diverse datasets to intermingle to encourage new, multidisciplinary discoveries for current and future scientists.

Read full abstract

The rapid growth of the internet and related technologieshas already had a tremendous impact on scientific publish-ing. This journal has given attention to open accesspublishing (Ascoli 2005; Bug 2005; Merkel-Sobotta 2005;Velterop 2005), to reforming the review process (DeSchutter 2007; Saper and Maunsell 2009) and to theproblems with getting authors to share their data (Ascoli2006; Kennedy 2006; Teeters et al. 2008; Van Horn andBall 2008) and how to enhance the use of shared data(Gardner et al. 2008; Kennedy 2010).But the impact of the internet and data warehousing onscience will be much larger and there is a growing interestin how these technologies can be leveraged to improve thescientific process (Hey et al. 2009). Let’s travel towards thefuture and imagine that not only the tools and infrastructureare available to share scientific data at any time after it isgenerated, but that it has also become standard practice forthe community to do so. How this can be achieved is notthe focus of this editorial, instead I want to speculate on therelationship between scientific papers and data repositories(Bourne 2005, 2010; Cinkosky et al. 1991) in such anenvironment. It is important for the scientific community todiscuss these issues now because, while these technologiesare expected to radically improve the scientific process,they will also change the way in which our work isevaluated.I propose that we should distinguish data publishingfrom paper publishing (Callaghan et al. 2009; Cinkosky etal. 1991) and, when established for specific scientific fields,promote data publishing as the primary outlet for much ofthe scientific output.A good metaphor for data publishing is to look at howcomplete organism genomic sequences are published inhigh impact journals now (Srivastava et al. 2010; Warren etal. 2010). Such papers serve really two goals: to announcethe availability of the genome sequence in GenBank and todescribe some scientific conclusions based on the analysisof the genome. The perceived importance of the latterdetermines whether a high impact journal will accept thepaper and therefore the authors spend a lot of effort inhyping this part. But are these two components irrevocablyintertwined? Couldn’t one just publish the data, in this caseby depositing the complete sequence in a database, andannounce this fact through a form of publication? Theanalysis can then be published separately at a later time ordistributed over different papers, etc. This is not donebecause at present the publication of the paper in the highimpact journal is considered to be the optimal reward forthe researchers, both for career advancement and forsuccess in obtaining new grants (Bourne 2005). I call datapublication a method where the data providers, who may bedifferent from the people who analyze the data, receivecredit for their work when they deposit the sequence in thedatabase and where subsequent access to the data is trackedand considered equivalent to paper citation.There are a number of advantages to considering datapublication as a separate process. First, credit assignmentbecomes more explicitly defined among the authors.Several journals (like Nature, Science, the PLoS series,etc.) have taken steps towards a more granular creditassignment by asking authors to explicitly list their

Read full abstract

Use Of Shared Data Research Articles

Related Topics

Articles published on Use Of Shared Data

Recommendations for sharing network data and materials

African researchers do not think differently about Open Data

Status, use and impact of sharing individual participant data from clinical trials: a scoping review

National platform for Rare Diseases Data Registry of Japan.

The Safe and Effective Use of Shared Data Underpinned by Stakeholder Engagement and Evaluation Practice.

Availability and Use of Shared Data From Cardiometabolic Clinical Trials.

EVALITA Goes Social: Tasks, Data, and Community at the 2016 Edition

CHARMed PyMca, Part I: A Protocol for Improved Inter‐laboratory Reproducibility in the Quantitative ED‐XRF Analysis of Copper Alloys

VARIABILITY IN CLINICAL RESEARCH DATA MANAGEMENT PRACTICES: LESSONS FROM THE MALARIA COMMUNITY

The Virtuous Cycle of a Data Ecosystem.

Exploring the determinants of scientific data sharing: Understanding the motivation to publish research data

Data Publishing and Scientific Journals: The Future of the Scientific Paper in a World of Shared Data

Spatiotemporal integration of molecular and anatomical data in virtual reality using semantic mapping

A short form of the Dimensional Assessment of Personality Pathology-Basic Questionnaire (DAPP-BQ): The DAPP-SF

Ethical and Legal Considerations Regarding Disputed Authorship with the Use of Shared Data

Challenges, approaches and architecture for distributed process integration in heterogeneous environments

A survey of cache coherence schemes for multiprocessors

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

Use Of Shared Data Research Articles

Related Topics

Articles published on Use Of Shared Data

Recommendations for sharing network data and materials

African researchers do not think differently about Open Data

Status, use and impact of sharing individual participant data from clinical trials: a scoping review

National platform for Rare Diseases Data Registry of Japan.

The Safe and Effective Use of Shared Data Underpinned by Stakeholder Engagement and Evaluation Practice.

Availability and Use of Shared Data From Cardiometabolic Clinical Trials.

EVALITA Goes Social: Tasks, Data, and Community at the 2016 Edition

CHARMed PyMca, Part I: A Protocol for Improved Inter‐laboratory Reproducibility in the Quantitative ED‐XRF Analysis of Copper Alloys

VARIABILITY IN CLINICAL RESEARCH DATA MANAGEMENT PRACTICES: LESSONS FROM THE MALARIA COMMUNITY

The Virtuous Cycle of a Data Ecosystem.

Exploring the determinants of scientific data sharing: Understanding the motivation to publish research data

Data Publishing and Scientific Journals: The Future of the Scientific Paper in a World of Shared Data

Spatiotemporal integration of molecular and anatomical data in virtual reality using semantic mapping

A short form of the Dimensional Assessment of Personality Pathology-Basic Questionnaire (DAPP-BQ): The DAPP-SF

Ethical and Legal Considerations Regarding Disputed Authorship with the Use of Shared Data

Challenges, approaches and architecture for distributed process integration in heterogeneous environments

A survey of cache coherence schemes for multiprocessors