Designing Efficient Index-Digit Algorithms for CUDA GPU Architectures

Jacobo Lobeiras,Margarita Amor,Ramon Doallo

doi:10.1109/tpds.2015.2450718

Abstract

Modern graphics processing units (GPUs) offer very high computing power at relatively low cost. Nevertheless, designing efficient algorithms for the GPUs normally requires additional time and effort, even for experienced programmers. In this work we present a tuning methodology that allows the design for CUDA-enabled GPU architectures of index-digit algorithms, that is, algorithms where the data movement can be described as the permutations of the digits comprising the indices of the data elements. This methodology, based on two-stages identified as GPU resource analysis and operators string manipulation, is applied to FFT and tridiagonal systems solver algorithms, analyzing the performance features and the most adequate solutions. The resulting implementation is compact and outperforms other well-known and commonly used state-of-the-art libraries, with an improvement of up to 19.2 percent over NVIDIA's complex CUFFT , and more than 3000 percent over the NVIDIA'sCUDPP for real data tridiagonal systems.

Full Text