Abstract

BLAST is a central application in bioinformatics and so has been the subject of numerous acceleration studies. The de facto standard version of this code, NCBI BLAST, uses complex heuristics which make it challenging to simultaneously achieve both high performance and exact agreement with the original output. In previous work, we have used novel FPGA-based filters that reduce the input database by over 99.99% without loss of sensitivity. In the present work there are two primary contributions. The first is a new mechanism to couple two of the filters in such a way that promising alignments can be found in a fraction of the previous time. The second is the pipelining of the three filters. This is a challenging load balancing problem since the work per filter drops by 5× – 10× at both of the interfaces. Pipelining the filters has two benefits: it removes the need to reconfigure between passes and it reduces the off-chip bandwidth requirement. Together, these two enhancements more than double the performance over the previous best implementation. We currently have CAAD BLASTP working on Virtex-6 and Stratix-IV FPGAs with speed-ups of 9× and 15×, respectively, over the multithreaded original code running on an 8-core PC. We discuss FPGA features that cause this performance disparity. CAAD BLASTP scales easily and is appropriate for use in large FPGA-based servers.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call