RabbitFX: Efficient Framework for FASTA/Q File Parsing on Modern Multi-Core Platforms.

Hao Zhang,Zekun Yin,Mingkai Wang,Yanjie Wei,Xiaoming Xu,Honglei Song,Weiguo Liu,Qixin Chang,Bertil Schmidt

doi:10.1109/tcbb.2022.3219114

Abstract

The continuous growth of generated sequencing data leads to the development of a variety of associated bioinformatics tools. However, many of them are not able to fully exploit the resources of modern multi-core systems since they are bottlenecked by parsing files leading to slow execution times. This motivates the design of an efficient method for parsing sequencing data that can exploit the power of modern hardware, especially for modern CPUs with fast storage devices. We have developed RabbitFX, a fast, efficient, and easy-to-use framework for processing biological sequencing data on modern multi-core platforms. It can efficiently read FASTA and FASTQ files by combining a lightweight parsing method by means of an optimized formatting implementation. Furthermore, we provide user-friendly and modularized C++ APIs that can be easily integrated into applications in order to increase their file parsing speed. As proof-of-concept, we have integrated RabbitFX into three I/O-intensive applications: fastp, Ktrim, and Mash. Our evaluation shows that the inclusion of RabbitFX leads to speedups of at least 11.6 (6.6), 2.4 (2.4), and 3.7 (3.2) compared to the original versions on plain (gzip-compressed) files, respectively. These case studies demonstrate that RabbitFX can be easily integrated into a variety of NGS analysis tools to significantly reduce associated runtimes. It is open source software available at https://github.com/RabbitBio/RabbitFX.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

RabbitFX: Efficient Framework for FASTA/Q File Parsing on Modern Multi-Core Platforms.

Abstract

Talk to us

Similar Papers

More From: IEEE/ACM transactions on computational biology and bioinformatics

Lead the way for us

Journal: IEEE/ACM transactions on computational biology and bioinformatics	Publication Date: May 1, 2023
Citations: 4

Similar Papers

Short Read (Next-Generation) Sequencing
Jaya Punetha ... Eric P Hoffman
Circulation: Cardiovascular Genetics | VOL. 6
Jaya Punetha, et. al.Jaya Punetha ... Eric P Hoffman
14 Jul 2013
Circulation: Cardiovascular Genetics | VOL. 6

Efficient Memory-Mapped I/O on Fast Storage Device
Nae Young Song ... Heon Young Yeom
ACM Transactions on Storage | VOL. 12
Nae Young Song, et. al.Nae Young Song ... Heon Young Yeom
20 May 2016
ACM Transactions on Storage | VOL. 12

BigSeqKit: a parallel Big Data toolkit to process FASTA and FASTQ files at scale.
César Piñeiro ... Juan C Pichel
GigaScience | VOL. 12
César Piñeiro, et. al.César Piñeiro ... Juan C Pichel
28 Dec 2022
GigaScience | VOL. 12

SimpiTB - a pipeline designed to extract meaningful information from whole genome sequencing data of Mycobacterium tuberculosis complex, allows to combine genomic, phylogenetic and clustering analyses in existing SITVIT databases.
David Couvin ... Wilfried Segretier
Infection, Genetics and Evolution | VOL. 113
David Couvin, et. al.David Couvin ... Wilfried Segretier
01 Sep 2023
Infection, Genetics and Evolution | VOL. 113

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

RabbitFX: Efficient Framework for FASTA/Q File Parsing on Modern Multi-Core Platforms.

Abstract

Talk to us

Similar Papers

More From: IEEE/ACM transactions on computational biology and bioinformatics