Improved assembly of noisy long reads by k-mer validation.

Antonio Bernardo Carvalho,Gabriel Goldstein,Eduardo G Dupim

doi:10.1101/gr.209247.116

Antonio Bernardo Carvalho, Gabriel Goldstein + Show 1 more

Open Access

https://doi.org/10.1101/gr.209247.116

Copy DOI

Journal: Genome Research	Publication Date: Oct 7, 2016
Citations: 38	License type: cc-by-nc

Affiliation: Universidade Federal do Rio de Janeiro

Abstract

Genome assembly depends critically on read length. Two recent technologies, from Pacific Biosciences (PacBio) and Oxford Nanopore, produce read lengths >20 kb, which yield de novo genome assemblies with vastly greater contiguity than those based on Sanger, Illumina, or other technologies. However, the very high error rates of these two new technologies (∼15% per base) makes assembly imprecise at repeats longer than the read length and computationally expensive. Here we show that the contiguity and quality of the assembly of these noisy long reads can be significantly improved at a minimal cost, by leveraging on the low error rate and low cost of Illumina short reads. Namely, k-mers from the PacBio raw reads that are not present in Illumina reads (which account for ∼95% of the distinct k-mers) are deemed sequencing errors and ignored at the seed alignment step. By focusing on the ∼5% of k-mers that are error free, read overlap sensitivity is dramatically increased. Of equal importance, the validation procedure can be extended to exclude repetitive k-mers, which prevents read miscorrection at repeats and further improves the resulting assemblies. We tested the k-mer validation procedure using one long-read technology (PacBio) and one assembler (MHAP/Celera Assembler), but it is very likely to yield analogous improvements with alternative long-read technologies and assemblers, such as Oxford Nanopore and BLASR/DALIGNER/Falcon, respectively.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Improved assembly of noisy long reads by k-mer validation.

Abstract

Talk to us

Similar Papers

More From: Genome Research

Lead the way for us

Similar Papers

Genome assembly using Nanopore-guided long and error-free DNA reads.
Mohammed-Amin Madoui ... Caroline Belser
BMC Genomics | VOL. 16
Mohammed-Amin Madoui, et. al.Mohammed-Amin Madoui ... Caroline Belser
20 Apr 2015
BMC Genomics | VOL. 16

Hybrid correction of highly noisy long reads using a variable-order de Bruijn graph.
Pierre Morisse ... Arnaud Lefebvre
Bioinformatics (Oxford, England) | VOL. 34
Pierre Morisse, et. al.Pierre Morisse ... Arnaud Lefebvre
28 Jun 2018
Bioinformatics (Oxford, England) | VOL. 34

Comparison of long-read methods for sequencing and assembly of a plant genome.
Valentine Murigneux ... Lachlan J M Coin
GigaScience | VOL. 9
Valentine Murigneux, et. al.Valentine Murigneux ... Lachlan J M Coin
21 Dec 2020
GigaScience | VOL. 9

HASLR: Fast Hybrid Assembly of Long Reads.
Ehsan Haghshenas ... Faraz Hach
iScience | VOL. 23
Ehsan Haghshenas, et. al.Ehsan Haghshenas ... Faraz Hach
25 Jul 2020
iScience | VOL. 23

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Improved assembly of noisy long reads by k-mer validation.

Abstract

Talk to us

Similar Papers

More From: Genome Research