Tridiagonal GPU Solver with Scaled Partial Pivoting at Maximum Bandwidth

Christoph Klein,Robert Strzodka

doi:10.1145/3472456.3472484

Abstract

Partial pivoting is the method of choice to ensure stability in matrix factorizations performed on CPUs. For sparse matrices, this has not been implemented on GPUs so far because of problems with data-dependent execution flow. This work incorporates scaled partial pivoting into a tridiagonal GPU solver in such a fashion that despite the data-dependent decisions no SIMD divergence occurs. The cost of the computation is completely hidden behind the data movement which itself runs at maximum bandwidth. Therefore, the cost of the tridiagonal GPU solver is no more than the minimally required data movement. For large single precision systems with 2^25 unknowns, speedups of 5 are reported in comparison to the numerically stable tridiagonal solver (gtsv2) of cuSPARSE. The proposed tridiagonal solver is also evaluated as a preconditioner for Krylov solvers of large sparse linear equation systems. As expected it performs best for problems with strong anisotropies.

Full Text