QuickBAM: a parallelized BAM file access API for high-throughput sequence analysis informatics.

Anders Pitman,Gabor T Marth,Xiaomeng Huang,Yi Qiao

doi:10.1093/bioinformatics/btad463

Abstract

In time-critical clinical settings, such as precision medicine, genomic data needs to be processed as fast as possible to arrive at data-informed treatment decisions in a timely fashion. While sequencing throughput has dramatically increased over the past decade, bioinformatics analysis throughput has not been able to keep up with the pace of computer hardware improvement, and consequently has now turned into the primary bottleneck. Modern computer hardware today is capable of much higher performance than current genomic informatics algorithms can typically utilize, therefore presenting opportunities for significant improvement of performance. Accessing the raw sequencing data from BAM files, e.g. is a necessary and time-consuming step in nearly all sequence analysis tools, however existing programming libraries for BAM access do not take full advantage of the parallel input/output capabilities of storage devices. In an effort to stimulate the development of a new generation of faster sequence analysis tools, we developed quickBAM, a software library to accelerate sequencing data access by exploiting the parallelism in commodity storage hardware currently widely available. We demonstrate that analysis software ported to quickBAM consistently outperforms their current versions, in some cases finishing an analysis in under 3 min while the original version took 1.5 h, using the same storage solution. Open source and freely available at https://gitlab.com/yiq/quickbam/, we envision that quickBAM will enable a new generation of high-performance informatics tools, either directly boosting their performance if they are currently data-access bottlenecked, or allow data-access to keep up with further optimizations in algorithms and compute techniques.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Bioinformatics	Publication Date: Jul 27, 2023
Citations: 1	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

QuickBAM: a parallelized BAM file access API for high-throughput sequence analysis informatics.

Abstract

Talk to us

Similar Papers

More From: Bioinformatics

Lead the way for us

Similar Papers

Abstract 2607: The cBioPortal for Cancer Genomics: an open source platform for accessing and interpreting complex cancer genomics data in the era of precision medicine
Jianjiong Gao ...
Cancer Research | VOL. 77
Jianjiong Gao, et. al.Jianjiong Gao ...
01 Jul 2017
Cancer Research | VOL. 77

Phenotype Prediction of DNA Sequence Data: A Machine- and Statistical Learning Approach
Darlington Mapiye ... Lavina Joseph
-
Darlington Mapiye, et. al.Darlington Mapiye ... Lavina Joseph
27 Oct 2020
27 Oct 2020

RmvPFBAM: Removing Primers from BAM Files Based on Amplicon-Based Next-Generation Sequencing and Cloud Computing When Analyzing Personal Genome Data
Yanjun Ma
Scientific Programming | VOL. 2021
Yanjun MaYanjun Ma
16 Nov 2021
Scientific Programming | VOL. 2021

Author response: COVID-19 CG enables SARS-CoV-2 mutation and lineage tracking by locations and dates of interest
Albert Tian Chen ... Shing Hei Zhan
-
Albert Tian Chen, et. al.Albert Tian Chen ... Shing Hei Zhan
22 Jan 2021
22 Jan 2021

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

QuickBAM: a parallelized BAM file access API for high-throughput sequence analysis informatics.

Abstract

Talk to us

Similar Papers

More From: Bioinformatics