Fast inexact mapping using advanced tree exploration on backward search methods.

José Salavert,Joaquín Tárraga,Ignacio Blanquer,Ignacio Medina,Joaquín Dopazo,Andrés Tomás

doi:10.1186/s12859-014-0438-3

José Salavert, Joaquín Tárraga + Show 4 more

Open Access

https://doi.org/10.1186/s12859-014-0438-3

Copy DOI

Abstract

BackgroundShort sequence mapping methods for Next Generation Sequencing consist on a combination of seeding techniques followed by local alignment based on dynamic programming approaches. Most seeding algorithms are based on backward search alignment, using the Burrows Wheeler Transform, the Ferragina and Manzini Index or Suffix Arrays. All these backward search algorithms have excellent performance, but their computational cost highly increases when allowing errors. In this paper, we discuss an inexact mapping algorithm based on pruning strategies for search tree exploration over genomic data.ResultsThe proposed algorithm achieves a 13x speed-up over similar algorithms when allowing 6 base errors, including insertions, deletions and mismatches. This algorithm can deal with 400 bps reads with up to 9 errors in a high quality Illumina dataset. In this example, the algorithm works as a preprocessor that reduces by 55% the number of reads to be aligned. Depending on the aligner the overall execution time is reduced between 20–40%.ConclusionsAlthough not intended as a complete sequence mapping tool, the proposed algorithm could be used as a preprocessing step to modern sequence mappers. This step significantly reduces the number reads to be aligned, accelerating overall alignment time. Furthermore, this algorithm could be used for accelerating the seeding step of already available sequence mappers. In addition, an out-of-core index has been implemented for working with large genomes on systems without expensive memory configurations.

Highlights

Short sequence mapping methods for Generation Sequencing consist on a combination of seeding techniques followed by local alignment based on dynamic programming approaches
Comparison with other FM-Index only algorithms As we stated before, our algorithm is not intended as a full sequence mapper, only a preprocessing step for modern sequence mappers
The purpose of this study is to provide a fair comparison against similar algorithms based only on FM-Index backward search, performing the experiments under the same input, execution arguments and system environment

Summary

Introduction

Short sequence mapping methods for Generation Sequencing consist on a combination of seeding techniques followed by local alignment based on dynamic programming approaches. Differences between reads and the reference appear, due to the natural genetic variability or failures in the sequence digitalisation phase For this reason, a mapping algorithm must allow a certain number of errors, Several inexact alignment solutions available in the literature focus on dynamic programming approaches, like the Smith-Waterman Algorithm [2,3] (SW) or the Hidden Markov Models [4] (HMM). A mapping algorithm must allow a certain number of errors, Several inexact alignment solutions available in the literature focus on dynamic programming approaches, like the Smith-Waterman Algorithm [2,3] (SW) or the Hidden Markov Models [4] (HMM) Their computational complexity depends on the length of the read multiplied by the length of the reference genome

Objectives

Methods

Results

Conclusion

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: BMC Bioinformatics	Publication Date: Jan 28, 2015
Citations: 30	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

Fast inexact mapping using advanced tree exploration on backward search methods.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: BMC Bioinformatics

Lead the way for us

Similar Papers

Inexact Mapping of Short Biological Sequences in High Performance Computational Environments
José Salavert Torres
-
José Salavert TorresJosé Salavert Torres
30 Oct 2014
30 Oct 2014

Pair-End Inexact Mapping on Hybrid GPU Environments and Out-Of-Core Indexes
...
Current Bioinformatics | VOL. 11
, et. al. ...
17 Aug 2016
Current Bioinformatics | VOL. 11

14.8 A 135mW fully integrated data processor for next-generation sequencing
Yi-Chung Wu ... Jui-Hung Hung
-
Yi-Chung Wu, et. al.Yi-Chung Wu ... Jui-Hung Hung
01 Feb 2017
14.8 A 135mW fully integrated data processor for next-generation sequencing
Yi-Chung Wu ... Jui-Hung Hung

Computation of the suffix array, Burrows-Wheeler transform and FM-index in V-order
Jacqueline W Daykin ... W.F Smyth
Theoretical Computer Science | VOL. 880
Jacqueline W Daykin, et. al.Jacqueline W Daykin ... W.F Smyth
06 Jun 2021
Theoretical Computer Science | VOL. 880

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Fast inexact mapping using advanced tree exploration on backward search methods.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: BMC Bioinformatics