Comparing Memory-Efficient Genome Assemblers on Stand-Alone and Cloud Infrastructures

Dimitrios Kleftogiannis,Panos Kalnis,Vladimir B Bajic

doi:10.1371/journal.pone.0075505

Dimitrios Kleftogiannis, Panos Kalnis + Show 1 more

Open Access

https://doi.org/10.1371/journal.pone.0075505

Copy DOI

Journal: PLoS ONE	Publication Date: Sep 27, 2013
Citations: 24	License type: CC BY 4.0

Affiliation: King Abdullah University of Science and Technology

Abstract

A fundamental problem in bioinformatics is genome assembly. Next-generation sequencing (NGS) technologies produce large volumes of fragmented genome reads, which require large amounts of memory to assemble the complete genome efficiently. With recent improvements in DNA sequencing technologies, it is expected that the memory footprint required for the assembly process will increase dramatically and will emerge as a limiting factor in processing widely available NGS-generated reads. In this report, we compare current memory-efficient techniques for genome assembly with respect to quality, memory consumption and execution time. Our experiments prove that it is possible to generate draft assemblies of reasonable quality on conventional multi-purpose computers with very limited available memory by choosing suitable assembly methods. Our study reveals the minimum memory requirements for different assembly programs even when data volume exceeds memory capacity by orders of magnitude. By combining existing methodologies, we propose two general assembly strategies that can improve short-read assembly approaches and result in reduction of the memory footprint. Finally, we discuss the possibility of utilizing cloud infrastructures for genome assembly and we comment on some findings regarding suitable computational resources for assembly.

Highlights

Genome assembly is a fundamental problem in sequence bioinformatics [1] and many assemblers have been developed up to now [2,3,4,5,6,7,8,9,10,11,12,13]
Current Next-Generation Sequencing (NGS) technologies deliver the following significant improvements over older methods [14]: (i) the read length has increased to several hundreds or even thousands of base pairs for single-molecule, real-time sequencing; (ii) genome coverage has increased by orders of magnitude; (iii) the sequencing process has become much faster and much cheaper [15]; (iv) whole genome sequencing (WGS) for every organism has become feasible [16]; (v) metagenomics assembly from environmental samples has become possible [17]
Our results show that DiMA is a general strategy for reducing the memory requirements of traditional assemblers

Summary

Introduction

Genome assembly is a fundamental problem in sequence bioinformatics [1] and many assemblers have been developed up to now [2,3,4,5,6,7,8,9,10,11,12,13]. The input for genome assembly is generated using the Next-Generation Sequencing (NGS) technologies. A side effect of NGS is the massive amount of generated raw data that normally requires computers with very large memories for the assembly process. Traditional short-read assemblers require around 256 GB RAM for datasets with roughly 500 million reads [18]. This problem is expected to worsen in the future because the NGS data generation rate has exceeded expectations based on Moore’s law [19], meaning that the amount of raw data is expected to grow much faster than the capacity of available memory. Despite the practical significance of the problem, existing reviews [1,20,21,22] and comparison studies like Assemblathon [23] and GAGE [18], have focused on the quality of the assembly, but not on memory requirements

Methods

Results

Discussion

Conclusion

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Comparing Memory-Efficient Genome Assemblers on Stand-Alone and Cloud Infrastructures

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: PLoS ONE

Lead the way for us

Similar Papers

A Fast Sketch-based Assembler for Genomes
Priyanka Ghosh ... Ananth Kalyanaraman
-
Priyanka Ghosh, et. al.Priyanka Ghosh ... Ananth Kalyanaraman
02 Oct 2016
02 Oct 2016

FastEtch: A Fast Sketch-Based Assembler for Genomes.
Priyanka Ghosh ... Ananth Kalyanaraman
IEEE/ACM transactions on computational biology and bioinformatics | VOL. 16
Priyanka Ghosh, et. al.Priyanka Ghosh ... Ananth Kalyanaraman
11 Sep 2017
IEEE/ACM transactions on computational biology and bioinformatics | VOL. 16

Short Read (Next-Generation) Sequencing
Jaya Punetha ... Eric P Hoffman
Circulation: Cardiovascular Genetics | VOL. 6
Jaya Punetha, et. al.Jaya Punetha ... Eric P Hoffman
14 Jul 2013
Circulation: Cardiovascular Genetics | VOL. 6

How much will new technologies lower the cost of DNA sequencing?
Tieliu Shi
Chinese Science Bulletin | VOL. 62
Tieliu ShiTieliu Shi
22 May 2017
Chinese Science Bulletin | VOL. 62

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Comparing Memory-Efficient Genome Assemblers on Stand-Alone and Cloud Infrastructures

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: PLoS ONE