Abstract

Sparse matrix vector multiplication (SpMV) is a basic building block of many scientific applications. Several GPU accelerated SpMV algorithms for the CSR format suffer from workload unbalance for irregular matrices. In this paper, we propose a new auxiliary array assisted CSR format called local segmented reduction based CSR (LSRB-CSR), which enables synchronization free preprocessing and efficient SpMV algorithm with the light weight auxiliary arrays. It is efficient for both regular matrices and irregular matrices with tiny preprocessing overhead. We compare our LSRB-CSR based SpMV algorithm with the CSR-based SpMV from cuSPARSE, the SpMV algorithm based on segmented reduction adopted by CUDPP library, and the CSR5-based SpMV algorithm for both regular and irregular sparse matrices. Compared to cuSparse, our LSRB-CSR based SpMV algorithm could improve the performance by 26% on regular matrices and up to 4750% on irregular matrices. Compared to CUDPP, our LSRB-CSR based SpMV algorithm could improve the average SpMV performance by 210% on regular matrices and 250% on irregular matrices. Our LSRB-CSR based SpMV algorithm has comparable performance as the CSR5 based SpMV algorithm for regular matrices, and achieves better performance over the CSR5 based SpMV algorithm for irregular matrices. Experimental results show that the conversion overhead from the CSR to the LSRB-CSR is only 1/10 of the overhead from the CSR to the CSR5 on average.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call