Abstract
A multi-GPU parallelization of exact string matching algorithms based on the backward-search procedure by using indexing techniques, such as the Burrows-Wheeler Transform and the FM-Index, is proposed in this paper. To attain an efficient execution on highly heterogeneous parallel platforms, the proposed parallelization adopted an unified OpenCL implementation that allows its execution either in CPUs and in multiple and possibly different GPU devices (e.g., NVIDIA and AMD GPUs) that integrate the targeted platform. Furthermore, the proposed implementation incorporates convenient load-balancing techniques, in order to ensure not only a convenient balance of the involved workload to minimize the resulting processing time, but also the possibility to scale the offered throughput with the number of exploited GPUs. The obtained experimental results showed that the proposed multi-GPU parallelization platform is able to offer significant speedups (greater than 10×, when using one single GPU) when compared to conventional mainstream multi-threaded CPU implementations (Bowtie - 8 threads), and between 5× and 30× when compared to other popular BWT-based aligners, namely BWA and SOAP2, using their multi-threading options. When compared with state of the art GPU implementations (e.g., SOAP3, HPG-BWT, Barracuda and CUSHAW2-GPU), the proposed implementation showed to be able to provide speedups between 2.5× and 5×. The execution of the proposed alignment platform when considering multiple and completely distinct GPU devices demonstrated the ability to efficiently scale the resulting throughput, by offering a convenient load-balancing of the involved processing in the several distinct devices.
Published Version
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have