A Novel Compute-Efficient Tridiagonal Solver for Many-Core Architectures

Kan Liu,Wei Xue

doi:10.1109/tpds.2022.3214762

Abstract

The tridiagonal solver is an important kernel and is widely supported in mainstream numerical libraries. While parallel algorithms have been studied for many-core architectures, the performance of current algorithms and implementations is still hindered by input size sensitivity and cross-platform portability. In this paper, we propose a novel algorithm WM-pGE for the batched solution of diagonally dominant tridiagonal systems. The algorithm balances the key design objectives, including computation complexity, memory complexity, parallelism, and input size sensitivity, better than existing algorithms. Moreover, an elegant formulation is presented to show the implementation and cross-platform optimization without loss of efficiency and generality, by extracting the platform-dependent works into only four vector operators. The results from our batched tridiagonal experiments show that the proposed algorithm outperforms the prior work PCR-pThomas by 25% and 12% on NVIDIA Tesla V100 in single and double precision, respectively. On Intel KNL, our method achieves a 10% improvement in performance over PCR-pThomas in double precision.

Full Text