A novel compression tool for efficient storage of genome resequencing data

Congmao Wang,Dabing Zhang

doi:10.1093/nar/gkr009

Abstract

With the advent of DNA sequencing technologies, more and more reference genome sequences are available for many organisms. Analyzing sequence variation and understanding its biological importance are becoming a major research aim. However, how to store and process the huge amount of eukaryotic genome data, such as those of the human, mouse and rice, has become a challenge to biologists. Currently available bioinformatics tools used to compress genome sequence data have some limitations, such as the requirement of the reference single nucleotide polymorphisms (SNPs) map and information on deletions and insertions. Here, we present a novel compression tool for storing and analyzing Genome ReSequencing data, named GRS. GRS is able to process the genome sequence data without the use of the reference SNPs and other sequence variation information and automatically rebuild the individual genome sequence data using the reference genome sequence. When its performance was tested on the first Korean personal genome sequence data set, GRS was able to achieve ∼159-fold compression, reducing the size of the data from 2986.8 to 18.8 MB. While being tested against the sequencing data from rice and Arabidopsis thaliana, GRS compressed the 361.0 MB rice genome data to 4.4 MB, and the A. thaliana genome data from 115.1 MB to 6.5 KB. This de novo compression tool is available at http://gmdd.shgmo.org/Computational-Biology/GRS.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Nucleic Acids Research	Publication Date: Jan 25, 2011
Citations: 81	License type: CC BY-NC 2.5

R Discovery Prime

R Discovery Prime

A novel compression tool for efficient storage of genome resequencing data

Abstract

Talk to us

Similar Papers

More From: Nucleic Acids Research

Lead the way for us

Similar Papers

Author response: Genomic epidemiology of COVID-19 in care homes in the east of England
...
-
, et. al. ...
04 Jan 2021
04 Jan 2021

Decision letter: Genomic epidemiology of COVID-19 in care homes in the east of England
Amy Wesolowski ... Miles P Davenport
-
Amy Wesolowski, et. al.Amy Wesolowski ... Miles P Davenport
01 Dec 2020
01 Dec 2020

Genomic Selection: Status in Different Species and Challenges for Breeding
Kf Stock ... R Reents
Reproduction in Domestic Animals | VOL. 48
Kf Stock, et. al.Kf Stock ... R Reents
21 Aug 2013
Reproduction in Domestic Animals | VOL. 48

The genomic and transcriptomic landscape of clinical Escherichia coli and Pseudomonas aeruginosa isolates

-

15 May 2019
15 May 2019

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

A novel compression tool for efficient storage of genome resequencing data

Abstract

Talk to us

Similar Papers

More From: Nucleic Acids Research