A new PPM model for quality score compression

M Akgun,M S Sagiroglu

doi:10.1109/siu.2013.6531447

Abstract

Next Generation Sequencing (NGS) platforms generate nucleotide sequences with header data and quality information. These platforms may produce gigabyte-scale datasets. The biggest problem of NGS technology is the storage of these datasets. Nucleotide sequences, supporting information and quality scores are stored in FASTQ format. In this paper, we consider the compression of quality scores and propose an algorithm for lossless compression of quality scores. We try to find a model that gives the lowest entropy on quality score data. We combine our powerful statistical model with arithmetic coding to compress the quality score data the smallest. We compare its performance to text compression utilities such as bzip2, gzip and ppmd and existing compression algorithms for quality scores. We show that the performance of our compression algorithm is superior to that of both systems.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

A new PPM model for quality score compression

Abstract

Talk to us

Similar Papers

Lead the way for us

Similar Papers

ACO:lossless quality score compression based on adaptive coding order
Yi Niu ... Mingming Ma
BMC Bioinformatics | VOL. 23
Yi Niu, et. al.Yi Niu ... Mingming Ma
07 Jun 2022
BMC Bioinformatics | VOL. 23

FQZip: Lossless Reference-Based Compression of Next Generation Sequencing Data in FASTQ Format
Yongpeng Zhang ... Linsen Li
-
Yongpeng Zhang, et. al.Yongpeng Zhang ... Linsen Li
01 Jan 2015
01 Jan 2015

Abstract 4259: Conversion of the Lung Cancer Risk Test (LCRT) to a next generation sequencing (NGS) platform
Erin L Crawford ... James C Willey
Cancer Research | VOL. 75
Erin L Crawford, et. al.Erin L Crawford ... James C Willey
01 Aug 2015
Abstract 4259: Conversion of the Lung Cancer Risk Test (LCRT) to a next generation sequencing (NGS) platform
Erin L Crawford ... James C Willey

Validation for Clinical Use of, and Initial Clinical Experience with, a Novel Approach to Population-Based Carrier Screening using High-Throughput, Next-Generation DNA Sequencing
Stephanie Hallam ... Caleb J Kennedy
The Journal of Molecular Diagnostics | VOL. 16
Stephanie Hallam, et. al.Stephanie Hallam ... Caleb J Kennedy
27 Dec 2013
The Journal of Molecular Diagnostics | VOL. 16

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

A new PPM model for quality score compression

Abstract

Talk to us

Similar Papers