Abstract

HPG Aligner applies suffix arrays for DNA read mapping. This implementation produces a highly sensitive and extremely fast mapping of DNA reads that scales up almost linearly with read length. The approach presented here is faster (over 20× for long reads) and more sensitive (over 98% in a wide range of read lengths) than the current state-of-the-art mappers. HPG Aligner is not only an optimal alternative for current sequencers but also the only solution available to cope with longer reads and growing throughputs produced by forthcoming sequencing technologies.Availability and implementation: https://github.com/opencb/hpg-aligner.Contact: jdopazo@cipf.es or imedina@ebi.ac.ukSupplementary information: Supplementary data are available at Bioinformatics online.

Highlights

  • Among the many applications of the high-throughput sequencing (HTS) technologies, DNA resequencing is probably the most extensively used because of its important clinical implications (Biesecker, 2010)

  • Clusters of the extended seeds define the candidate alignment locations (CALs), i.e. regions that correspond to highly probable mappings of a read

  • This implementation exploits the multiple cores of the CPUs and, within them, the Streaming SIMD Extensions (SSE) registers to achieve two levels of parallelization: (i) inter-core parallelization, by distributing batches of pairs of query sequence and reference gap sequence to be aligned among multiple cores/threads in the processor, and (ii) intra-core parallelization (Rognes and Seeberg, 2000), by processing a batch of sequence pairs using the SSE registers within a core

Read more

Summary

INTRODUCTION

Among the many applications of the high-throughput sequencing (HTS) technologies, DNA resequencing is probably the most extensively used because of its important clinical implications (Biesecker, 2010). While accuracy of short reads mapping process is quite reasonable, speed still remains to be an issue. Given the way in which available mappers implement current state-of-the-art mapping algorithms, such as Burroughs-Wheeler Transform, accuracy usually falls down as read length increases because of the accumulation of errors. Approaches that overcome these current and future problems, given that the trend in HTS technologies is to increase read length and throughput (Watson, 2014). Suffix array (SA) has recently started to be applied to accelerate DNA (Bussotti et al, 2011; Chen et al, 2013) or RNA (Dobin et al, 2013) read mapping. We propose an approach, based on SA (Mamber and Myers, 1993), that enormously increases the mapping speed without sacrificing accuracy for an ample range of read lengths

METHODS
RESULTS
Simulated data
Real datasets
Program availability

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.