Jabba: hybrid error correction for long sequencing reads.

Giles Miclotte,Piet Demeester,Jan Fostier,Mahdi Heydari,Yves Van De Peer,Stephane Rombauts,Pieter Audenaert

doi:10.1186/s13015-016-0075-7

Abstract

BackgroundThird generation sequencing platforms produce longer reads with higher error rates than second generation technologies. While the improved read length can provide useful information for downstream analysis, underlying algorithms are challenged by the high error rate. Error correction methods in which accurate short reads are used to correct noisy long reads appear to be attractive to generate high-quality long reads. Methods that align short reads to long reads do not optimally use the information contained in the second generation data, and suffer from large runtimes. Recently, a new hybrid error correcting method has been proposed, where the second generation data is first assembled into a de Bruijn graph, on which the long reads are then aligned.ResultsIn this context we present Jabba, a hybrid method to correct long third generation reads by mapping them on a corrected de Bruijn graph that was constructed from second generation data. Unique to our method is the use of a pseudo alignment approach with a seed-and-extend methodology, using maximal exact matches (MEMs) as seeds. In addition to benchmark results, certain theoretical results concerning the possibilities and limitations of the use of MEMs in the context of third generation reads are presented.ConclusionJabba produces highly reliable corrected reads: almost all corrected reads align to the reference, and these alignments have a very high identity. Many of the aligned reads are error-free. Additionally, Jabba corrects reads using a very low amount of CPU time. From this we conclude that pseudo alignment with MEMs is a fast and reliable method to map long highly erroneous sequences on a de Bruijn graph.

Highlights

The accurate determination of the DNA sequence of an organism, i.e., establishing the precise order of the nucleotides A, C, G and T in a DNA molecule, is a fundamental and challenging problem in biology
Whereas LoRDEC relies on shared k-mers to align the long reads to a de Bruijn graph, we explore the idea of using maximal exact matches (MEMs)
Results for proovread on S. cerevisiae have been left out because they did not compute in 3 days

Summary

Introduction

Background The accurate determination of the DNA sequence of an organism, i.e., establishing the precise order of the nucleotides A, C, G and T in a DNA molecule, is a fundamental and challenging problem in biology. This process consists of two steps: (1) sequencing the DNA by means of a chemical process, resulting in a large number of reads and (2) genome assembly, where the reads are processed to reconstruct the complete DNA sequence. A new hybrid error correcting method has been proposed, where the second generation data is first assembled into a de Bruijn graph, on which the long reads are aligned

Methods

Results

Discussion

Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Algorithms for Molecular Biology	Publication Date: May 3, 2016
Citations: 71	License type: cc-by

R Discovery Prime

R Discovery Prime

Jabba: hybrid error correction for long sequencing reads.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Algorithms for Molecular Biology

Lead the way for us

Similar Papers

Jabba: Hybrid Error Correction for Long Sequencing Reads Using Maximal Exact Matches
Giles Miclotte ... Piet Demeester
-
Giles Miclotte, et. al.Giles Miclotte ... Piet Demeester
01 Jan 2015
01 Jan 2015

A hybrid and scalable error correction algorithm for indel and substitution errors of long reads
Arghya Kusum Das ... Seung-Jong Park
BMC Genomics | VOL. 20
Arghya Kusum Das, et. al.Arghya Kusum Das ... Seung-Jong Park
01 Dec 2019
BMC Genomics | VOL. 20

Long Read Error Correction Algorithm Based on the de Bruijn Graph for the Third-generation Sequencing
Bin Hou ... Rongshu Wang
-
Bin Hou, et. al.Bin Hou ... Rongshu Wang
24 Sep 2021
24 Sep 2021

Toward perfect reads: self-correction of short reads via mapping on de Bruijn graphs.
Antoine Limasset ... Jean-François Flot
Bioinformatics | VOL. 36
Antoine Limasset, et. al.Antoine Limasset ... Jean-François Flot
06 Dec 2019
Bioinformatics | VOL. 36

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Jabba: hybrid error correction for long sequencing reads.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Algorithms for Molecular Biology