Batched sparse direct solver design and evaluation in SuperLU_DIST

Wajih Boukaram,Tianyi Shi,Yuxi Hong,Xiaoye S Li,Yang Liu

doi:10.1177/10943420241268200

Wajih Boukaram, Tianyi Shi + Show 3 more

https://doi.org/10.1177/10943420241268200

Copy DOI

Export

Save

Cite

Abstract
Full-Text
Similar Papers

Abstract

Listen

Over the course of interactions with various application teams, the need for batched sparse linear algebra functions has emerged in order to make more efficient use of the GPUs for many small and sparse linear algebra problems. In this paper, we present our recent work on a batched sparse direct solver for GPUs. The sparse LU factorization is computed by the levels of the elimination tree, leveraging the batched dense operations at each level and a new batched Scatter GPU kernel. The sparse triangular solve is computed by the level sets of the directed acyclic graph (DAG) of the triangular matrix. Batched operations overcome the large overhead associated with launching many small kernels. For medium sized matrix batches with not-so-small bandwidth, using an NVIDIA A100 GPU, our new batched sparse direct solver is orders of magnitude faster than a batched banded solver and uses less than one-tenth of the memory.

Full Text