Abstract

Data analysis often entails a multitude of heterogeneous steps, from the application of various command line tools to the usage of scripting languages like R or Python for the generation of plots and tables. It is widely recognized that data analyses should ideally be conducted in a reproducible way.Reproducibility enables technical validation and regeneration of results on the original or even new data. However, reproducibility alone is by no means sufficient to deliver an analysis that is of lasting impact (i.e., sustainable) for the field, or even just one research group. We postulate that it is equally important to ensure adaptability and transparency. The former describes the ability to modify the analysis to answer extended or slightly different research questions. The latter describes the ability to understand the analysis in order to judge whether it is not only technically, but methodologically valid. Here, we analyze the properties needed for a data analysis to become reproducible, adaptable, and transparent. We show how the popular workflow management system Snakemake can be used to guarantee this, and how it enables an ergonomic, combined, unified representation of all steps involved in data analysis, ranging from raw data processing, to quality control and fine-grained, interactive exploration and plotting of final results.

Highlights

  • Despite the ubiquity of data analysis across scientific disciplines, it is a challenge to ensure in silico reproducibility[1,2,3]

  • We show how the popular workflow management system Snakemake can be used to guarantee this, and how it enables an ergonomic, combined, unified representation of all steps involved in data analysis, ranging from raw data processing, to quality control and fine-grained, interactive exploration and plotting of final results

  • We show how data analysis sustainability in terms of these aspects is supported by the open source workflow management system Snakemake

Read more

Summary

METHOD ARTICLE

Sustainable data analysis with Snakemake [version 1; peer review: 1 approved, 1 approved with reservations]. Felix Mölder 1,2, Kim Philipp Jablonski 3,4, Brice Letcher 5, Michael B. Tomkins-Tinch 6,7, Vanessa Sochat 8, Jan Forster[1,9], Soohyun Lee 10, Sven O. Twardziok[11], Alexander Kanitz 12,13, Andreas Wilm[14], Manuel Holtgrewe[11,15], Sven Rahmann[16], Sven Nahnsen[17], Johannes Köster 1,18.

18 Jan 2021
Introduction
Methods and results
Objective
Conclusion
Mesirov JP
Coelho LP: Jug
11. Goodstadt L
34. Handschuh H
Findings
36. McKinney W
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.