Abstract

Pre-alignment filters are useful for reducing the computational requirements of genomic sequence mappers. Most of them are based on estimating or computing the edit distance between sequences and their candidate locations in a reference genome using a subset of the dynamic programming table used to compute Levenshtein distance. Some of their FPGA implementations of use classic HDL toolchains, thus limiting their portability. Currently, most FPGA accelerators offered by heterogeneous cloud providers support C/C++ HLS. In this work, we implement and optimize several state-of-the-art pre-alignment filters using C/C++ based-HLS to expand their portability to a wide range of systems supporting the OpenCL runtime. Moreover, we perform a complete analysis of the performance and accuracy of the filters and analyze the implications of the results. The maximum throughput obtained by an exact filter is 95.1 MPairs/s including memory transfers using 100 bp sequences, which is the highest ever reported for a comparable system and more than two times faster than previous HDL-based results. The best energy efficiency obtained from the accelerator (not considering host CPU) is 2.1 MPairs/J, more than one order of magnitude higher than other accelerator-based comparable approaches from the state of the art.

Highlights

  • More than a decade after the irruption of Next-Generation Sequencing (NGS,[1][2]), genetic sequencing has become an indispensable tool in current medical practice and it is expected to be even more important in the future

  • As we saw in section II.A, accuracy has a direct impact on the expected speedup factor (

  • We report the throughput of FPGA HDL-based implementation in original papers and the throughput of our best performing OpenCL implementation on D5005 and the HARPv2 system

Read more

Summary

Introduction

More than a decade after the irruption of Next-Generation Sequencing (NGS,[1][2]), genetic sequencing has become an indispensable tool in current medical practice and it is expected to be even more important in the future. Human genome is around 3 Gbp. Current sequencing technologies are not able to extract the complete genome of complex organisms in one sequence but just a large set of small subsequences from them, called reads. The enormous interest of genomic analysis, its introduction as part of the regular medical practice, and the continuous price reduction of sequencing machines has produced an enormous increase in the data loads processed by genomic labs. In this context, the acceleration of all the processes involved in the analysis is fundamental to continue the mass deployment of the technology ([14]).

Objectives
Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call