Datastorr: a workflow and package for delivering successive versions of 'evolving data' directly into R.

Daniel S Falster,Richard G Fitzjohn,Matthew W Pennell,William K Cornwell

doi:10.1093/gigascience/giz035

Daniel S Falster, Richard G Fitzjohn + Show 2 more

Open Access

https://doi.org/10.1093/gigascience/giz035

Copy DOI

Abstract

The sharing and re-use of data has become a cornerstone of modern science. Multiple platforms now allow easy publication of datasets. So far, however, platforms for data sharing offer limited functions for distributing and interacting with evolving datasets— those that continue to grow with time as more records are added, errors fixed, and new data structures are created. In this article, we describe a workflow for maintaining and distributing successive versions of an evolving dataset, allowing users to retrieve and load different versions directly into the R platform. Our workflow utilizes tools and platforms used for development and distribution of successive versions of an open source software program, including version control, GitHub, and semantic versioning, and applies these to the analogous process of developing successive versions of an open source dataset. Moreover, we argue that this model allows for individual research groups to achieve a dynamic and versioned model of data delivery at no cost.

Highlights

Evolving datasets are those that are often being expanded and improved
We suggest adapting the process of semantic versioning, developed for labelling successive releases of softanalogous to that of software, determined by the structure of the dataset and changes in that structure
Our package, called datastorr sioning of software at semver.org, we suggest the following, facilitates access to guidelines for labelling of a dataset with semantic versioning: releases of any evolving dataset hosted on GitHub (Fig. 1)

Summary

Availability of data and materials

All datasets and code on which the conclusions of the paper rely must be either included in your submission or deposited in publicly available repositories (where available and ethically appropriate), referencing such data using a unique identifier in the references and in the “Availability of Data and Materials” section of your manuscript. Have you have met the above requirement as detailed in our Minimum Standards Reporting Checklist?. Datastorr: a work ow and package for delivering successive versions of “evolving data” directly into R.

Key Points

Availability of source code and requirements

Ethical Approval

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: GigaScience	Publication Date: May 1, 2019
Citations: 2	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

Datastorr: a workflow and package for delivering successive versions of 'evolving data' directly into R.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: GigaScience

Lead the way for us

Similar Papers

Open or Sneaky? Fast or Slow? Light or Heavy?: Investigating Security Releases of Open Source Packages
Nasif Imtiaz ... Laurie Williams
IEEE Transactions on Software Engineering | VOL. 49
Nasif Imtiaz, et. al.Nasif Imtiaz ... Laurie Williams
01 Apr 2023
IEEE Transactions on Software Engineering | VOL. 49

Fast 4D Modeling for Real-time Motion Management in Radiation Therapy
B Guo ... C Shi
International Journal of Radiation Oncology, Biology, Physics | VOL. 78
B Guo, et. al.B Guo ... C Shi
30 Sep 2010
International Journal of Radiation Oncology, Biology, Physics | VOL. 78

Structural Characterization of Plasma Metabolites Detected via LC-Electrochemical Coulometric Array Using LC-UV Fractionation, MS, and NMR
Susan S Bird ... Bruce S Kristal
Analytical Chemistry | VOL. 84
Susan S Bird, et. al.Susan S Bird ... Bruce S Kristal
06 Nov 2012
Analytical Chemistry | VOL. 84

Model-based Generation of Web Application Programming Interfaces to Access Open Data
Cesar González-Mora ... Jose Zubcoff
Journal of Web Engineering | VOL. -
Cesar González-Mora, et. al.Cesar González-Mora ... Jose Zubcoff
24 Dec 2020
Journal of Web Engineering | VOL. -

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Datastorr: a workflow and package for delivering successive versions of 'evolving data' directly into R.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: GigaScience