Abstract

The sharing and re-use of data has become a cornerstone of modern science. Multiple platforms now allow easy publication of datasets. So far, however, platforms for data sharing offer limited functions for distributing and interacting with evolving datasets— those that continue to grow with time as more records are added, errors fixed, and new data structures are created. In this article, we describe a workflow for maintaining and distributing successive versions of an evolving dataset, allowing users to retrieve and load different versions directly into the R platform. Our workflow utilizes tools and platforms used for development and distribution of successive versions of an open source software program, including version control, GitHub, and semantic versioning, and applies these to the analogous process of developing successive versions of an open source dataset. Moreover, we argue that this model allows for individual research groups to achieve a dynamic and versioned model of data delivery at no cost.

Highlights

  • Evolving datasets are those that are often being expanded and improved

  • We suggest adapting the process of semantic versioning, developed for labelling successive releases of softanalogous to that of software, determined by the structure of the dataset and changes in that structure

  • Our package, called datastorr sioning of software at semver.org, we suggest the following, facilitates access to guidelines for labelling of a dataset with semantic versioning: releases of any evolving dataset hosted on GitHub (Fig. 1)

Read more

Summary

Availability of data and materials

All datasets and code on which the conclusions of the paper rely must be either included in your submission or deposited in publicly available repositories (where available and ethically appropriate), referencing such data using a unique identifier in the references and in the “Availability of Data and Materials” section of your manuscript. Have you have met the above requirement as detailed in our Minimum Standards Reporting Checklist?. Datastorr: a work ow and package for delivering successive versions of “evolving data” directly into R.

Key Points
Availability of source code and requirements
Ethical Approval
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call