Linear time complexity de novo long read genome assembly with GoldRush

Johnathan Wong,Lauren Coombe,Vladimir Nikolić,Emily Zhang,Ka Ming Nip,Puneet Sidhu,René L Warren,Inanç Birol

doi:10.1038/s41467-023-38716-x

Johnathan Wong, Lauren Coombe + Show 6 more

Open Access

https://doi.org/10.1038/s41467-023-38716-x

Copy DOI

Abstract

Current state-of-the-art de novo long read genome assemblers follow the Overlap-Layout-Consensus paradigm. While read-to-read overlap – its most costly step – was improved in modern long read genome assemblers, these tools still often require excessive RAM when assembling a typical human dataset. Our work departs from this paradigm, foregoing all-vs-all sequence alignments in favor of a dynamic data structure implemented in GoldRush, a de novo long read genome assembly algorithm with linear time complexity. We tested GoldRush on Oxford Nanopore Technologies long sequencing read datasets with different base error profiles sourced from three human cell lines, rice, and tomato. Here, we show that GoldRush achieves assembly scaffold NGA50 lengths of 18.3-22.2, 0.3 and 2.6 Mbp, for the genomes of human, rice, and tomato, respectively, and assembles each genome within a day, using at most 54.5 GB of random-access memory, demonstrating the scalability of our genome assembly paradigm and its implementation.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Nature communications	Publication Date: May 22, 2023
Citations: 13	License type: open-access

R Discovery Prime

R Discovery Prime

Linear time complexity de novo long read genome assembly with GoldRush

Abstract

Talk to us

Similar Papers

More From: Nature communications

Lead the way for us

Similar Papers

Complete Genome Sequences of Four Strains ofErwiniatracheiphila: A Resource for Studying aBacterial Plant Pathogen with a Highly Complex Genome.
Breah Lasarre ... Gwyn A Beattie
Molecular Plant-Microbe Interactions® | VOL. 35
Breah Lasarre, et. al.Breah Lasarre ... Gwyn A Beattie
01 May 2022
Molecular Plant-Microbe Interactions® | VOL. 35

Probabilistic Algorithms for Election Result Prediction
Shibu Kumar K.B. ... Rajeev K.K.
-
Shibu Kumar K.B., et. al.Shibu Kumar K.B. ... Rajeev K.K.
01 Sep 2014
01 Sep 2014

MOD-CHAR: an implementation of Char's spanning tree enumeration algorithm and its complexity analysis
R Jayakumar ... K Thulasiraman
IEEE Transactions on Circuits and Systems | VOL. 36
R Jayakumar, et. al.R Jayakumar ... K Thulasiraman
01 Jan 1989
IEEE Transactions on Circuits and Systems | VOL. 36

A parallel variant of a heuristical algorithm for graph coloring — Corrigendum
Janez Žerovnik ... Matjaž Kaufman
Parallel Computing | VOL. 18
Janez Žerovnik, et. al.Janez Žerovnik ... Matjaž Kaufman
01 Aug 1992
Parallel Computing | VOL. 18

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Linear time complexity de novo long read genome assembly with GoldRush

Abstract

Talk to us

Similar Papers

More From: Nature communications