Abstract

Motivation: To tackle the problem of huge memory usage associated with de Bruijn graph-based algorithms, upon which some of the most widely used de novo genome assemblers have been built, we released SparseAssembler1. SparseAssembler1 can save as much as 90% memory consumption in comparison with the state-of-art assemblers, but it requires rounds of denoising to accurately assemble genomes. Algorithmetically, we developed an extension of de Bruijn graph structure — 'sparse de Bruijn graphs' — skipping a certain number of intermediate k-mers. In this paper, we introduce a new general model for genome assembly that uses only sparse k-mers. The new model replaces the idea of the de Bruijn graph from the beginning, and achieves similar memory efficiency and much better robustness compared with our previous SparseAssembler1. Results: Based on the sparse k-mers graph model, we develop SparseAssembler2. We demonstrate that the decomposition of reads of all overlapping k-mers, which is used in existing de Bruijn graph genome assemblers, is overly cautious. We introduce a sparse *To whom correspondence should be addressed. k-mer graph structure for saving sparse k-mers, which greatly reduces memory space requirements necessary for de novo genome assembly. In contrast with the de Bruijn graph approach, we devise a simple but powerful strategy, i.e., finding links between the k-mers in the genome and traversing following the links, which can be done by saving only a few k-mers. To implement the strategy, we need to only select some k-mers that may not even be overlapping ones, and build the links between these k-mers indicated by the reads. We can traverse through this sparse k-mer graph to build the contigs, and ultimately complete the genome assembly. Since the new sparse k-mers graph shares almost all advantages of de Bruijn graph, we are able to adapt a Dijkstra-like breadth-first search algorithm, for the new sparse k-mer graph in order to circumvent sequencing errors and resolve polymorphisms. Availability: Programs in both Windows and Linux are available at: https://sites.google.com/site/sparseassembler/. Contact: ma@vandals.uidaho.edu or mpop@umiacs.umd.edu SparseAssembler2: Sparse k-mer Graph for Memory Efficient Genome Assembly

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call