A hybrid short read mapping accelerator

Yupeng Chen,Bertil Schmidt,Douglas L Maskell

doi:10.1186/1471-2105-14-67

Yupeng Chen, Bertil Schmidt + Show 1 more

Open Access

https://doi.org/10.1186/1471-2105-14-67

Copy DOI

Abstract

BackgroundThe rapid growth of short read datasets poses a new challenge to the short read mapping problem in terms of sensitivity and execution speed. Existing methods often use a restrictive error model for computing the alignments to improve speed, whereas more flexible error models are generally too slow for large-scale applications. A number of short read mapping software tools have been proposed. However, designs based on hardware are relatively rare. Field programmable gate arrays (FPGAs) have been successfully used in a number of specific application areas, such as the DSP and communications domains due to their outstanding parallel data processing capabilities, making them a competitive platform to solve problems that are “inherently parallel”.ResultsWe present a hybrid system for short read mapping utilizing both FPGA-based hardware and CPU-based software. The computation intensive alignment and the seed generation operations are mapped onto an FPGA. We present a computationally efficient, parallel block-wise alignment structure (Align Core) to approximate the conventional dynamic programming algorithm. The performance is compared to the multi-threaded CPU-based GASSST and BWA software implementations. For single-end alignment, our hybrid system achieves faster processing speed than GASSST (with a similar sensitivity) and BWA (with a higher sensitivity); for pair-end alignment, our design achieves a slightly worse sensitivity than that of BWA but has a higher processing speed.ConclusionsThis paper shows that our hybrid system can effectively accelerate the mapping of short reads to a reference genome based on the seed-and-extend approach. The performance comparison to the GASSST and BWA software implementations under different conditions shows that our hybrid design achieves a high degree of sensitivity and requires less overall execution time with only modest FPGA resource utilization. Our hybrid system design also shows that the performance bottleneck for the short read mapping problem can be changed from the alignment stage to the seed generation stage, which provides an additional requirement for the future development of short read aligners.

Highlights

The rapid growth of short read datasets poses a new challenge to the short read mapping problem in terms of sensitivity and execution speed
The Field programmable gate arrays (FPGAs) aligner works at 200 MHz, which maximizes the I/O data transfer
The performance of our FPGA aligner is compared with GASSST and BWA

Summary

Introduction

The rapid growth of short read datasets poses a new challenge to the short read mapping problem in terms of sensitivity and execution speed. The main task of short read mapping is to align the reads to a given reference genome Mapping this large volume of data is a challenge for existing sequence alignment tools. The basic idea is simple: since only a limited number of errors are allowed for a significant alignmenta long exact match regions exist. Discovering these exact matches (called common k-mers or seeds) before the alignment process can largely reduce the search space. Detection of these seeds is usually performed using two approaches: (i) indexing of the input read dataset and scanning through the reference genome, (ii) indexing of the reference genome and aligning each read independently

Methods

Results

Conclusion