High-Throughput Compression of FASTQ Data with SeqDB

Mark Howison

doi:10.1109/tcbb.2012.160

Abstract

Compression has become a critical step in storing next-generation sequencing (NGS) data sets because of both the increasing size and decreasing costs of such data. Recent research into efficiently compressing sequence data has focused largely on improving compression ratios. Yet, the throughputs of current methods now lag far behind the I/O bandwidths of modern storage systems. As biologists move their analyses to high-performance systems with greater I/O bandwidth, low-throughput compression becomes a limiting factor. To address this gap, we present a new storage model called SeqDB, which offers high-throughput compression of sequence data with minimal sacrifice in compression ratio. It achieves this by combining the existing multithreaded Blosc compressor with a new data-parallel byte-packing scheme, called SeqPack, which interleaves sequence data and quality scores.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

High-Throughput Compression of FASTQ Data with SeqDB

Abstract

Talk to us

Similar Papers

More From: IEEE/ACM Transactions on Computational Biology and Bioinformatics

Lead the way for us

Journal: IEEE/ACM Transactions on Computational Biology and Bioinformatics	Publication Date: Jan 1, 2013
Citations: 29

Similar Papers

Light-weight reference-based compression of FASTQ data.
Yongpeng Zhang ... Zexuan Zhu
BMC Bioinformatics | VOL. 16
Yongpeng Zhang, et. al.Yongpeng Zhang ... Zexuan Zhu
09 Jun 2015
BMC Bioinformatics | VOL. 16

Short Read (Next-Generation) Sequencing
Jaya Punetha ... Eric P Hoffman
Circulation: Cardiovascular Genetics | VOL. 6
Jaya Punetha, et. al.Jaya Punetha ... Eric P Hoffman
14 Jul 2013
Circulation: Cardiovascular Genetics | VOL. 6

Detection of Significant Copy Number Variations From Multiple Samples in Next-Generation Sequencing Data.
Xiguo Yuan ... Peizhen Fan
IEEE Transactions on NanoBioscience | VOL. 17
Xiguo Yuan, et. al.Xiguo Yuan ... Peizhen Fan
01 Jan 2018
IEEE Transactions on NanoBioscience | VOL. 17

EasyQC: Tool with Interactive User Interface for Efficient Next-Generation Sequencing Data Quality Control.
Vijaya Raghavan Rangamaran ... Kirubagaran Ramalingam
Journal of computational biology : a journal of computational molecular cell biology | VOL. 25
Vijaya Raghavan Rangamaran, et. al.Vijaya Raghavan Rangamaran ... Kirubagaran Ramalingam
08 Sep 2018
Journal of computational biology : a journal of computational molecular cell biology | VOL. 25

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

High-Throughput Compression of FASTQ Data with SeqDB

Abstract

Talk to us

Similar Papers

More From: IEEE/ACM Transactions on Computational Biology and Bioinformatics