Reproducible acquisition, management and meta-analysis of nucleotide sequence (meta)data using q2-fondue.

Michal Ziemski,Nicholas A Bokulich,Lina Kim,Lena Flörl,Anja Adamov,Jonathan Wren

doi:10.1093/bioinformatics/btac639

Michal Ziemski, Nicholas A Bokulich + Show 4 more

Open Access

https://doi.org/10.1093/bioinformatics/btac639

Copy DOI

Journal: Bioinformatics (Oxford, England)	Publication Date: Sep 20, 2022
Citations: 2	License type: CC BY 4.0

Affiliation: ETH Zurich

Abstract

The volume of public nucleotide sequence data has blossomed over the past two decades and is ripe for re- and meta-analyses to enable novel discoveries. However, reproducible re-use and management of sequence datasets and associated metadata remain critical challenges. We created the open source Python package q2-fondue to enable user-friendly acquisition, re-use and management of public sequence (meta)data while adhering to open data principles. q2-fondue allows fully provenance-tracked programmatic access to and management of data from the NCBI Sequence Read Archive (SRA). Unlike other packages allowing download of sequence data from the SRA, q2-fondue enables full data provenance tracking from data download to final visualization, integrates with the QIIME 2 ecosystem, prevents data loss upon space exhaustion and allows download of (meta)data given a publication library. To highlight its manifold capabilities, we present executable demonstrations using publicly available amplicon, whole genome and metagenome datasets. q2-fondue is available as an open-source BSD-3-licensed Python package at https://github.com/bokulich-lab/q2-fondue. Usage tutorials are available in the same repository. All Jupyter notebooks used in this article are available under https://github.com/bokulich-lab/q2-fondue-examples. Supplementary data are available at Bioinformatics online.

Full Text