Abstract

Modern graphics processing units (GPUs) offer very high computing power at relatively low cost. Nevertheless, designing efficient algorithms for the GPUs normally requires additional time and effort, even for experienced programmers. In this work we present a tuning methodology that allows the design for CUDA-enabled GPU architectures of index-digit algorithms, that is, algorithms where the data movement can be described as the permutations of the digits comprising the indices of the data elements. This methodology, based on two-stages identified as GPU resource analysis and operators string manipulation, is applied to FFT and tridiagonal systems solver algorithms, analyzing the performance features and the most adequate solutions. The resulting implementation is compact and outperforms other well-known and commonly used state-of-the-art libraries, with an improvement of up to 19.2 percent over NVIDIA's complex CUFFT , and more than 3000 percent over the NVIDIA'sCUDPP for real data tridiagonal systems.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.