Abstract

Extraordinary progress in genome sequencing technologies has led to a tremendous increase in the number of sequenced genomes. However, biologists have run into a computational bottleneck to assemble large and complex genomes quickly, due to the lack of scalable and parallel de novo assembly algorithms. Among several approaches to assembly, the iterative de Bruijn graph (DBG) assemblers, such as IDBA-UD, generate high-quality assemblies by sequentially iterating from small to large k-values used in graph construction. However, this approach is time intensive because the creation of the graphs for increasing k-values proceeds sequentially. For example, with just eight k-values, graph construction takes 96% of the total time to assemble a metagenomic dataset with 33 million paired-end reads. In this paper, we propose ScalaDBG, which transforms the sequential process of DBG construction for a range of k values, to one where each graph is built independently and in parallel. We develop a novel mechanism whereby the graph for the higher k value can be patched with contigs generated from the graph with the lower k value. We show that for a variety of datasets our technique can assemble complex genomes much faster than IDBA-UD (6.7X faster for the most complex genome in our dataset) while maintaining the same accuracy for the assembled genome. Moreover, ScalaDBG's multi-level parallelism allows it to simultaneously leverage the power of mighty server machines by using all their cores and of compute clusters by scaling out.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.