Pysradb: A Python package to query next-generation sequencing metadata and data from NCBI Sequence Read Archive.

Saket Choudhary

doi:10.12688/f1000research.18676.1

Abstract

The NCBI Sequence Read Archive (SRA) is the primary archive of next-generation sequencing datasets. SRA makes metadata and raw sequencing data available to the research community to encourage reproducibility and to provide avenues for testing novel hypotheses on publicly available data. However, methods to programmatically access this data are limited. We introduce the Python package, pysradb, which provides a collection of command line methods to query and download metadata and data from SRA, utilizing the curated metadata database available through the SRAdb project. We demonstrate the utility of pysradb on multiple use cases for searching and downloading SRA datasets. It is available freely at https://github.com/saketkc/pysradb.

Highlights

Several projects have made efforts to analyze and publish summaries of DNA-1 and RNA-seq[2,3] datasets
Obtaining metadata and raw data from the NCBI Sequence Read Archive (SRA)[4] is often the first step towards reanalyzing public next-generation sequencing datasets in order to compare them to private data or test a novel hypothesis
The NCBI SRA toolkit[5] provides utility methods to download raw sequencing data, while the metadata can be obtained by querying the website or through the Entrez efetch command line utility[6]

Summary

Introduction

Several projects have made efforts to analyze and publish summaries of DNA-1 and RNA-seq[2,3] datasets. SRAdb tracks the five main data objects in SRA’s metadata: submission, study, sample, experiment and run These are mapped to five different relational database tables that are made available in the SQLite file. The pysradb package[10] builds upon the principles of SRAdb, providing a simple and user-friendly commandline interface for querying metadata and downloading datasets from SRA. It provides utility functions that will further help a user perform more granular queries, which are often required when dealing with multiple datasets on a large scale By enabling both metadata search and download operations at the command-line, pysradb aims to bridge the gap in seamlessly retrieving public sequencing datasets and the associated metadata. To simplify the installation procedure for the end-user, it is available for download through PyPI and bioconda[12]

Methods

R Core Team

13. McKinney W

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: F1000Research	Publication Date: Apr 23, 2019
Citations: 38	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

Pysradb: A Python package to query next-generation sequencing metadata and data from NCBI Sequence Read Archive.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: F1000Research

Lead the way for us

Similar Papers

Pysradb: A Python package to query next-generation sequencing metadata and data from NCBI Sequence Read Archive
Ryan K Dale ... Saket Choudhary
F1000Research | VOL. 8
Ryan K Dale, et. al.Ryan K Dale ... Saket Choudhary
26 Apr 2019
F1000Research | VOL. 8

Don't just dump your data and run: Authors should submit as much experimental information as possible when uploading sequence data.
Matheus Sanitá Lima ... David Roy Smith
EMBO reports | VOL. 18
Matheus Sanitá Lima, et. al.Matheus Sanitá Lima ... David Roy Smith
27 Oct 2017
EMBO reports | VOL. 18

Reproducible acquisition, management and meta-analysis of nucleotide sequence (meta)data using q2-fondue.
Michal Ziemski ... Lina Kim
Bioinformatics | VOL. 38
Michal Ziemski, et. al.Michal Ziemski ... Lina Kim
20 Sep 2022
Bioinformatics | VOL. 38

Buried treasure in a public repository: Mining mitochondrial genes of 32 annelid species from sequence reads deposited in the Sequence Read Archive (SRA).
Genki Kobayashi
PeerJ | VOL. 11
Genki KobayashiGenki Kobayashi
29 Nov 2023
PeerJ | VOL. 11

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Pysradb: A Python package to query next-generation sequencing metadata and data from NCBI Sequence Read Archive.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: F1000Research