Abstract

Sparse matrix–vector multiplication (SpMV) is of singular importance in sparse linear algebra, which is an important issue in scientific computing and engineering practice. Much effort has been put into accelerating SpMV, and a few parallel solutions have been proposed. This paper focuses on a special type of SpMV, namely sparse quasi-diagonal matrix–vector multiplication (SQDMV). The sparse quasi-diagonal matrix is the key to solving many differential equations, and very little research has been done in this field. This paper discusses data structures and algorithms for SQDMV that are efficiently implemented on the compute unified device architecture (CUDA) platform for the fine-grained parallel architecture of the graphics processing unit (GPU). A new diagonal storage format, a hybrid of the diagonal format (DLA) and the compressed sparse row format (CSR) (HDC) will be presented, which overcomes the inefficiency of DLA in storing irregular matrices and the imbalances of CSR in storing non-zero elements. Furthermore, HDC can adjust the storage bandwidth of the diagonal to adapt to different discrete degrees of sparse matrix, so as to get a higher compression ratio than DLA and CSR, and reduce the computational complexity. Our implementation in a GPU shows that the performance of HDC is better than that of other formats, especially for matrices with some discrete points outside the main diagonal. In addition, we combine the different parts of HDC to make a unified kernel to get a better compression ratio and a higher speedup ratio in the GPU.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call