Abstract

Tridiagonal linear systems are of importance to many problems in numerical analysis and computational fluid dynamics, as well as to computer graphics applications in video games and computer-animated films. Typical applications require solving hundreds or thousands of tridiagonal systems, which takes a majority part of total computation time. Fast parallel solutions are critical to larger scientific simulations, interactive computations of special effects in films, and real-time applications in video games. This chapter describes the performance of multiple tridiagonal algorithms on a graphics processing units (GPU). It provides design that is a novel hybrid algorithm which combines a work-efficient algorithm with a step efficient algorithm in a way well-suited for a GPU architecture. Hybrid solver achieves 8× and 2× speed-up, respectively, in single precision and double precision over a multithreaded highly-optimized CPU solver, and a 2×–2.3× speedup over a basic GPU solver. In the future this can be used to handle non–power-of-two system sizes; effectively support a system size larger than 1024 and design solutions that can partially take advantage of shared memory even though the entire system cannot fit into shared memory.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call