SciDataFlow: a tool for improving the flow of data through science.

Vince Buffalo

doi:10.1093/bioinformatics/btad754

SciDataFlow: a tool for improving the flow of data through science.

Vince Buffalo

Open Access

https://doi.org/10.1093/bioinformatics/btad754

Copy DOI

Journal: Bioinformatics (Oxford, England)	Publication Date: Jan 2, 2024
License type: CC BY 4.0

Affiliation: University of California, Berkeley

#Open Scientific Research #Repository Platforms + Show 8 more

Abstract
Full-Text PDF
Similar Papers

Abstract

Managing data and code in open scientific research is complicated by two key problems: large datasets often cannot be stored alongside code in repository platforms like GitHub, and iterative analysis can lead to unnoticed changes to data, increasing the risk that analyses are based on older versions of data. SciDataFlow is a fast, concurrent command-line tool paired with a simple Data Manifest specification that streamlines tracking data changes, uploading data to remote repositories, and pulling in all data necessary to reproduce a computational analysis. SciDataFlow is available at https://github.com/vsbuffalo/scidataflow.

Full Text