Abstract

SUMMARYWith fierce competition between CPU and graphics processing unit (GPU) platforms, performance evaluation has become the focus of various sectors. In this paper, we take a well‐known algorithm in the field of biosequence matching and database searching, the Smith–Waterman (S‐W) algorithm as an example, and demonstrate approaches that fully exploit its performance potentials on CPU, GPU, and field‐programmable gate array (FPGA) computing platforms. For CPU platforms, we perform two optimizations, single instruction, multiple data and multithread, with compiler options, to gain over 70 × speedups over naive CPU versions on quad‐core CPU platforms. For GPU platforms, we propose the combination of coalesced global memory accesses, shared memory tiles, and loop unfolding, achieving 50 × speedups over initial GPU versions on an NVIDIA GeForce GTX 470 card. Experimental results show that the GPU GTX 470 gains 12 × speedups, instead of 100 × reported by some studies, over Intel quadcore CPU Q9400, under the same manufacturing technology and both with fully optimized schemes. In addition, for FPGA platforms, we customize a linear systolic array for the S‐W algorithm in a 45‐nm FPGA chip from Xilinx (XC6VLX760), with up to 1024 processing elements. Under only 133 MHz clock rate, the FPGA platform reaches the highest performance and becomes the most power‐efficient platform, using only 25 W compared with 190 W of the GPU GTX 470. Copyright © 2011 John Wiley & Sons, Ltd.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call