Acceleration of short and long DNA read mapping without loss of accuracy using suffix array.

Joaquín Tárraga,Ignacio Medina,Diego Cazorla,Raul Moreno,José Salavert-Torres,Joaquín Dopazo,Héctor Martínez,Ignacio Blanquer-Espert,Vicente Arnau

doi:10.1093/bioinformatics/btu553

Abstract

HPG Aligner applies suffix arrays for DNA read mapping. This implementation produces a highly sensitive and extremely fast mapping of DNA reads that scales up almost linearly with read length. The approach presented here is faster (over 20× for long reads) and more sensitive (over 98% in a wide range of read lengths) than the current state-of-the-art mappers. HPG Aligner is not only an optimal alternative for current sequencers but also the only solution available to cope with longer reads and growing throughputs produced by forthcoming sequencing technologies.Availability and implementation: https://github.com/opencb/hpg-aligner.Contact: jdopazo@cipf.es or imedina@ebi.ac.ukSupplementary information: Supplementary data are available at Bioinformatics online.

Highlights

Among the many applications of the high-throughput sequencing (HTS) technologies, DNA resequencing is probably the most extensively used because of its important clinical implications (Biesecker, 2010)
Clusters of the extended seeds define the candidate alignment locations (CALs), i.e. regions that correspond to highly probable mappings of a read
This implementation exploits the multiple cores of the CPUs and, within them, the Streaming SIMD Extensions (SSE) registers to achieve two levels of parallelization: (i) inter-core parallelization, by distributing batches of pairs of query sequence and reference gap sequence to be aligned among multiple cores/threads in the processor, and (ii) intra-core parallelization (Rognes and Seeberg, 2000), by processing a batch of sequence pairs using the SSE registers within a core

Summary

INTRODUCTION

Among the many applications of the high-throughput sequencing (HTS) technologies, DNA resequencing is probably the most extensively used because of its important clinical implications (Biesecker, 2010). While accuracy of short reads mapping process is quite reasonable, speed still remains to be an issue. Given the way in which available mappers implement current state-of-the-art mapping algorithms, such as Burroughs-Wheeler Transform, accuracy usually falls down as read length increases because of the accumulation of errors. Approaches that overcome these current and future problems, given that the trend in HTS technologies is to increase read length and throughput (Watson, 2014). Suffix array (SA) has recently started to be applied to accelerate DNA (Bussotti et al, 2011; Chen et al, 2013) or RNA (Dobin et al, 2013) read mapping. We propose an approach, based on SA (Mamber and Myers, 1993), that enormously increases the mapping speed without sacrificing accuracy for an ample range of read lengths

METHODS

RESULTS

Simulated data

Real datasets

Program availability

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Bioinformatics (Oxford, England)	Publication Date: Aug 20, 2014
Citations: 24	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

Acceleration of short and long DNA read mapping without loss of accuracy using suffix array.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Bioinformatics (Oxford, England)

Lead the way for us

Similar Papers

Analysis of Min-Hashing for Variant Tolerant DNA Read Mapping
...
-
, et. al. ...
01 Jan 2017
01 Jan 2017

G-SNPM - A GPU-based SNP mapping tool
Alessandro Orro ... Andrea Manconi
EMBnet.journal | VOL. 18
Alessandro Orro, et. al.Alessandro Orro ... Andrea Manconi
09 Nov 2012
EMBnet.journal | VOL. 18

Filtering with alignment free distances for high throughput DNA reads assembly
Maria C De Cola ... Daniele Santoni
EMBnet.journal | VOL. 18
Maria C De Cola, et. al.Maria C De Cola ... Daniele Santoni
09 Nov 2012
EMBnet.journal | VOL. 18

Assessment of the impact of using a reference transcriptome in mapping short RNA-Seq reads.
Shanrong Zhao
PloS one | VOL. 9
Shanrong ZhaoShanrong Zhao
03 Jul 2014
PloS one | VOL. 9

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Acceleration of short and long DNA read mapping without loss of accuracy using suffix array.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Bioinformatics (Oxford, England)