Optimization of quasi-diagonal matrix–vector multiplication on GPU

Wangdong Yang,Lanjun Wan,Kenli Li,Lin Shi,Yan Liu

doi:10.1177/1094342013501126

Abstract

Sparse matrix–vector multiplication (SpMV) is of singular importance in sparse linear algebra, which is an important issue in scientific computing and engineering practice. Much effort has been put into accelerating SpMV, and a few parallel solutions have been proposed. This paper focuses on a special type of SpMV, namely sparse quasi-diagonal matrix–vector multiplication (SQDMV). The sparse quasi-diagonal matrix is the key to solving many differential equations, and very little research has been done in this field. This paper discusses data structures and algorithms for SQDMV that are efficiently implemented on the compute unified device architecture (CUDA) platform for the fine-grained parallel architecture of the graphics processing unit (GPU). A new diagonal storage format, a hybrid of the diagonal format (DLA) and the compressed sparse row format (CSR) (HDC) will be presented, which overcomes the inefficiency of DLA in storing irregular matrices and the imbalances of CSR in storing non-zero elements. Furthermore, HDC can adjust the storage bandwidth of the diagonal to adapt to different discrete degrees of sparse matrix, so as to get a higher compression ratio than DLA and CSR, and reduce the computational complexity. Our implementation in a GPU shows that the performance of HDC is better than that of other formats, especially for matrices with some discrete points outside the main diagonal. In addition, we combine the different parts of HDC to make a unified kernel to get a better compression ratio and a higher speedup ratio in the GPU.

Full Text