Ψ-RA: a parallel sparse index for genomic read alignment

M Oğuzhan Külekci,Rahul Shah,Bojian Xu,Jeffrey Scott Vitter,Wing-Kai Hon

doi:10.1186/1471-2164-12-s2-s7

M Oğuzhan Külekci, Rahul Shah + Show 3 more

Open Access

https://doi.org/10.1186/1471-2164-12-s2-s7

Copy DOI

Abstract

BackgroundGenomic read alignment involves mapping (exactly or approximately) short reads from a particular individual onto a pre-sequenced reference genome of the same species. Because all individuals of the same species share the majority of their genomes, short reads alignment provides an alternative and much more efficient way to sequence the genome of a particular individual than does direct sequencing. Among many strategies proposed for this alignment process, indexing the reference genome and short read searching over the index is a dominant technique. Our goal is to design a space-efficient indexing structure with fast searching capability to catch the massive short reads produced by the next generation high-throughput DNA sequencing technology.ResultsWe concentrate on indexing DNA sequences via sparse suffix arrays (SSAs) and propose a new short read aligner named Ψ-RA (PSI-RA: parallel sparse index read aligner). The motivation in using SSAs is the ability to trade memory against time. It is possible to fine tune the space consumption of the index based on the available memory of the machine and the minimum length of the arriving pattern queries. Although SSAs have been studied before for exact matching of short reads, an elegant way of approximate matching capability was missing. We provide this by defining the rightmost mismatch criteria that prioritize the errors towards the end of the reads, where errors are more probable. Ψ-RA supports any number of mismatches in aligning reads. We give comparisons with some of the well-known short read aligners, and show that indexing a genome with SSA is a good alternative to the Burrows-Wheeler transform or seed-based solutions.ConclusionsΨ-RA is expected to serve as a valuable tool in the alignment of short reads generated by the next generation high-throughput sequencing technology. Ψ-RA is very fast in exact matching and also supports rightmost approximate matching. The SSA structure that Ψ-RA is built on naturally incorporates the modern multicore architecture and thus further speed-up can be gained. All the information, including the source code of Ψ-RA, can be downloaded at: http://www.busillis.com/o_kulekci/PSIRA.zip.

Highlights

Genomic read alignment involves mapping short reads from a particular individual onto a pre-sequenced reference genome of the same species
The dominant solution in genome indexing is the Burrows-Wheeler transform (BWT) [10] of the reference sequence (e.g., [5,11]), in which the reads are searched with the backwards search algorithm introduced in FM-index of Ferragina and Manzini [12]
Based on the fact that errors are more probable towards the end of the reads, we extend exact matching with sparse suffix arrays to include any number of mismatches by defining the rightmost mismatch criteria

Summary

Introduction

Genomic read alignment involves mapping (exactly or approximately) short reads from a particular individual onto a pre-sequenced reference genome of the same species. Because all individuals of the same species share the majority of their genomes, short reads alignment provides an alternative and much more efficient way to sequence the genome of a particular individual than does direct sequencing. Among many strategies proposed for this alignment process, indexing the reference genome and short read searching over the index is a dominant technique. Our goal is to design a space-efficient indexing structure with fast searching capability to catch the massive short reads produced by the generation high-throughput DNA sequencing technology. While some of the aligners index the reference genome, some others rely on hash tables based on q-grams or spaced seeds to perform a quick scan. The dominant solution in genome indexing is the Burrows-Wheeler transform (BWT) [10] of the reference sequence (e.g., [5,11]), in which the reads are searched with the backwards search algorithm introduced in FM-index of Ferragina and Manzini [12]

Objectives

Methods

Results

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: BMC Genomics	Publication Date: Jan 1, 2011
Citations: 24	License type: cc-by

R Discovery Prime

R Discovery Prime

Ψ-RA: a parallel sparse index for genomic read alignment

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: BMC Genomics

Lead the way for us

Similar Papers

G-SNPM - A GPU-based SNP mapping tool
Alessandro Orro ... Andrea Manconi
EMBnet.journal | VOL. 18
Alessandro Orro, et. al.Alessandro Orro ... Andrea Manconi
09 Nov 2012
EMBnet.journal | VOL. 18

Cancer genomics: new software tools making sequencing more accessible.
En-Guo Chen ... Yan Lu
Personalized Medicine | VOL. 11
En-Guo Chen, et. al.En-Guo Chen ... Yan Lu
01 Mar 2014
Personalized Medicine | VOL. 11

Next-generation massively parallel short-read mapping on FPGAs
Oliver Knodel ... Thomas B Preusser
-
Oliver Knodel, et. al.Oliver Knodel ... Thomas B Preusser
01 Sep 2011
01 Sep 2011

MapNext: a software tool for spliced and unspliced alignments and SNP detection of short sequence reads
Hua Bao ... Yuanyan Xiong
BMC Genomics | VOL. 10
Hua Bao, et. al.Hua Bao ... Yuanyan Xiong
01 Dec 2009
BMC Genomics | VOL. 10

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Ψ-RA: a parallel sparse index for genomic read alignment

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: BMC Genomics