Benchmarking of de novo assembly algorithms for Nanopore data reveals optimal performance of OLC approaches.

Yesesri Cherukuri,Sarath Chandra Janga

doi:10.1186/s12864-016-2895-8

Yesesri Cherukuri, Sarath Chandra Janga

Open Access

https://doi.org/10.1186/s12864-016-2895-8

Copy DOI

Abstract

BackgroundImproved DNA sequencing methods have transformed the field of genomics over the last decade. This has become possible due to the development of inexpensive short read sequencing technologies which have now resulted in three generations of sequencing platforms. More recently, a new fourth generation of Nanopore based single molecule sequencing technology, was developed based on MinION® sequencer which is portable, inexpensive and fast. It is capable of generating reads of length greater than 100 kb. Though it has many specific advantages, the two major limitations of the MinION reads are high error rates and the need for the development of downstream pipelines. The algorithms for error correction have already emerged, while development of pipelines is still at nascent stage.ResultsIn this study, we benchmarked available assembler algorithms to find an appropriate framework that can efficiently assemble Nanopore sequenced reads. To address this, we employed genome-scale Nanopore sequenced datasets available for E. coli and yeast genomes respectively. In order to comprehensively evaluate multiple algorithmic frameworks, we included assemblers based on de Bruijn graphs (Velvet and ABySS), Overlap Layout Consensus (OLC) (Celera) and Greedy extension (SSAKE) approaches. We analyzed the quality, accuracy of the assemblies as well as the computational performance of each of the assemblers included in our benchmark. Our analysis unveiled that OLC-based algorithm, Celera, could generate a high quality assembly with ten times higher N50 & mean contig values as well as one-fifth the number of total number of contigs compared to other tools. Celera was also found to exhibit an average genome coverage of 12 % in E. coli dataset and 70 % in Yeast dataset as well as relatively lesser run times. In contrast, de Bruijn graph based assemblers Velvet and ABySS generated the assemblies of moderate quality, in less time when there is no limitation on the memory allocation, while greedy extension based algorithm SSAKE generated an assembly of very poor quality but with genome coverage of 90 % on yeast dataset.ConclusionOLC can be considered as a favorable algorithmic framework for the development of assembler tools for Nanopore-based data, followed by de Bruijn based algorithms as they consume relatively less or similar run times as OLC-based algorithms for generating assembly, irrespective of the memory allocated for the task. However, few improvements must be made to the existing de Bruijn implementations in order to generate an assembly with reasonable quality. Our findings should help in stimulating the development of novel assemblers for handling Nanopore sequence data.Electronic supplementary materialThe online version of this article (doi:10.1186/s12864-016-2895-8) contains supplementary material, which is available to authorized users.

Highlights

Improved DNA sequencing methods have transformed the field of genomics over the last decade
Overlap Layout Consensus (OLC) can be considered as a favorable algorithmic framework for the development of assembler tools for Nanopore-based data, followed by de Bruijn based algorithms as they consume relatively less or similar run times as OLC-based algorithms for generating assembly, irrespective of the memory allocated for the task
Comparison of the assembly metrics generated by various assemblers reveals Celera as an optimal assembler The main features that can best explain the quality of an assembly from sequencing reads include the N50 value, number of contigs, mean length of contigs and the total sum of the lengths of all the contigs identified in an assembly

Summary

Introduction

Improved DNA sequencing methods have transformed the field of genomics over the last decade. Similar improvements were accomplished by the Illumina Truseq synthetic long-read sequencing strategy [11, 12], but the long range polymerase chain reaction step included in the library preparation will be a limitation in time-constrained projects, making it inaccessible to the whole research community. To overcome such limitations efforts are been made to develop an inexpensive single-molecule Nanopore-based fourth generation DNA sequencing technology [13,14,15,16,17]. Similar results were observed upon analyzing all three types of reads, illustrating the reproducibility of our results irrespective of the type of reads analyzed

Methods

Results

Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: BMC Genomics	Publication Date: Aug 1, 2016
Citations: 22	License type: cc-by

R Discovery Prime

R Discovery Prime

Benchmarking of de novo assembly algorithms for Nanopore data reveals optimal performance of OLC approaches.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: BMC Genomics

Lead the way for us

Similar Papers

Genome sequence assembly algorithms and misassembly identification methods.
Yue Meng ... Xiao Zhu
Molecular Biology Reports | VOL. 49
Yue Meng, et. al.Yue Meng ... Xiao Zhu
23 Sep 2022
Molecular Biology Reports | VOL. 49

Efficient reconfiguration algorithms of de Bruijn and Kautz networks into linear arrays
Rabah Harbane ... Marie-Claude Heydemann
Theoretical Computer Science | VOL. 263
Rabah Harbane, et. al.Rabah Harbane ... Marie-Claude Heydemann
01 Jul 2001
Theoretical Computer Science | VOL. 263

HAssembler: A hybrid de novo genome assembly approach for large genomes
Amit Kairi ... Atmakuri Ramakrishna Rao
The Indian Journal of Agricultural Sciences | VOL. 90
Amit Kairi, et. al.Amit Kairi ... Atmakuri Ramakrishna Rao
04 Dec 2020
The Indian Journal of Agricultural Sciences | VOL. 90

Algorithmic and computational comparison of metagenome assemblers
Anu Sharma ... Neeraj Budhlakoti
The Indian Journal of Agricultural Sciences | VOL. 90
Anu Sharma, et. al.Anu Sharma ... Neeraj Budhlakoti
04 Sep 2020
The Indian Journal of Agricultural Sciences | VOL. 90

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Benchmarking of de novo assembly algorithms for Nanopore data reveals optimal performance of OLC approaches.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: BMC Genomics