Community-based Matrix Reordering for Sparse Linear Algebra Optimization

Vignesh Balaji,Neal C Crago,Stephen W Keckler,Aamer Jaleel

doi:10.1109/ispass57527.2023.00029

Abstract

Sparse linear algebra kernels achieve sub-optimal performance due to their poor cache locality. Matrix reordering is an effective pre-processing optimization that improves cache locality and performance of these kernels. While many reordering techniques have been proposed, most prior work on matrix reordering suffer from two key limitations: (1) they evaluate their reordering proposal on a small set of arbitrarily-selected inputs and (2) they do not quantify the additional headroom for improvement after reordering is applied. To address these two limitations, we perform a detailed characterization of reordering techniques across a broad set of 50 input matrices where we quantify the ability of matrix reordering techniques to bring sparse linear algebra kernels close to hardware limits. Our analysis reveals that community-based matrix reordering is most effective at optimizing the execution of sparse linear algebra kernels, bringing the cuSPARSE SpMV kernel to within 54% of ideal run time on an NVIDIA A6000 GPU on average. However, community-based reordering is not uniformly effective across all 50 input matrices. We investigate the reasons when community-based reordering falls short and propose an enhanced version of community-based reordering that provides up to 1.57× additional performance improvements for the SpMV kernel.

Full Text