GTZ: a fast compression and cloud transmission tool optimized for FASTQ files

Yuting Xing,Zhuo Song,Chengkun Wu,Zhenguo Wang,Gen Li,Bolun Feng

doi:10.1186/s12859-017-1973-5

Yuting Xing, Zhuo Song + Show 4 more

Open Access

https://doi.org/10.1186/s12859-017-1973-5

Copy DOI

Journal: BMC Bioinformatics	Publication Date: Dec 1, 2017
Citations: 17	License type: open-access

Affiliation: National University of Defense Technology

Abstract

BackgroundThe dramatic development of DNA sequencing technology is generating real big data, craving for more storage and bandwidth. To speed up data sharing and bring data to computing resource faster and cheaper, it is necessary to develop a compression tool than can support efficient compression and transmission of sequencing data onto the cloud storage.ResultsThis paper presents GTZ, a compression and transmission tool, optimized for FASTQ files. As a reference-free lossless FASTQ compressor, GTZ treats different lines of FASTQ separately, utilizes adaptive context modelling to estimate their characteristic probabilities, and compresses data blocks with arithmetic coding. GTZ can also be used to compress multiple files or directories at once. Furthermore, as a tool to be used in the cloud computing era, it is capable of saving compressed data locally or transmitting data directly into cloud by choice. We evaluated the performance of GTZ on some diverse FASTQ benchmarks. Results show that in most cases, it outperforms many other tools in terms of the compression ratio, speed and stability.ConclusionsGTZ is a tool that enables efficient lossless FASTQ data compression and simultaneous data transmission onto to cloud. It emerges as a useful tool for NGS data storage and transmission in the cloud environment. GTZ is freely available online at: https://github.com/Genetalks/gtz.

Highlights

The dramatic development of DNA sequencing technology is generating real big data, craving for more storage and bandwidth
We present a tool GTZ, it is characterized as a lossless and efficient compression tool to be used jointly with cloud computing for large-scale genomic data analyses: 1. GTZ exploits context model technology combined with multiple prediction modelling schemes
A typical FASTQ file contains four lines per sequence: Line 1 begins with a character ‘@’ followed by a sequence identifier; Line 2 holds the raw sequence composed of A, C, T, and G; line 3 begins with a character ‘+’ and is optionally followed by the same sequence identifier again; line 4 holds the corresponding quality scores in ASCII characters for the sequence characters in line 2

Summary

Introduction

The dramatic development of DNA sequencing technology is generating real big data, craving for more storage and bandwidth. To speed up data sharing and bring data to computing resource faster and cheaper, it is necessary to develop a compression tool than can support efficient compression and transmission of sequencing data onto the cloud storage. General-propose compression tools, such as gzip (http://www.gzip.org/), bzip (http://www.bzip.org/) and 7z (www.7-zip.org), have been utilized to compress NGS data. These tools do not take advantage of the. Fqzcomp [6] estimates character probabilities by order-k context modelling and compresses NGS data in FASTQ format with the help of arithmetic coders

Methods

Results

Conclusion

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

GTZ: a fast compression and cloud transmission tool optimized for FASTQ files

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: BMC Bioinformatics

Lead the way for us

Similar Papers

ECG signal processing: Lossless compression, transmission via GSM network and feature extraction using Hilbert transform
S K Mukhopadhyay ... M Mitra
-
S K Mukhopadhyay, et. al.S K Mukhopadhyay ... M Mitra
01 Jan 2013
01 Jan 2013

Light-weight reference-based compression of FASTQ data.
Yongpeng Zhang ... Xiao Yang
BMC Bioinformatics | VOL. 16
Yongpeng Zhang, et. al.Yongpeng Zhang ... Xiao Yang
09 Jun 2015
BMC Bioinformatics | VOL. 16

Data compression techniques in IoT-enabled wireless body sensor networks: A systematic literature review and research trends for QoS improvement
Ihab Nassra ... Juan V Capella
Internet of Things | VOL. 23
Ihab Nassra, et. al.Ihab Nassra ... Juan V Capella
04 May 2023
Internet of Things | VOL. 23

Data compression via logic synthesis
Luca Amaru ... Giovanni De Micheli
-
Luca Amaru, et. al.Luca Amaru ... Giovanni De Micheli
01 Jan 2014
01 Jan 2014

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

GTZ: a fast compression and cloud transmission tool optimized for FASTQ files

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: BMC Bioinformatics