Abstract

Tridiagonal systems appear in many scientific and engineering problems, such as Alternating Direction Implicit methods, fluid simulation, and Poisson equation. This chapter presents the parallelization of the Augmented Block Cimmino Distributed method for solving tridiagonal systems on graphics processing units (GPUs). Because of the special structure of tridiagonal matrices, we investigate the boundary padding technique to eliminate the execution branches on GPUs. Various performance optimization techniques, such as memory coalescing, are also incorporated to further enhance the performance. We evaluate the performance of our GPU implementation and analyze the effectiveness of each optimization technique. Over 24 times speedups can be obtained on the GPU as compared to speedups on the CPU version.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call