Optimizing Sparse Matrix Vector Multiplication Using Cache Blocking Method on Fermi GPU

Weizhi Xu,Hao Zhang,Shuai Jiao,Fenglong Song,Da Wang,Zhiyong Liu

doi:10.1109/snpd.2012.20

Optimizing Sparse Matrix Vector Multiplication Using Cache Blocking Method on Fermi GPU

Weizhi Xu, Hao Zhang + Show 4 more

https://doi.org/10.1109/snpd.2012.20

Copy DOI

Publication Date: Aug 1, 2012

Citations: 39

Affiliation: Institute of Computing Technology, Chinese Academy of Sciences

#SpMV Kernel #GPU Cache + Show 8 more

Abstract
Full-Text PDF
Similar Papers

Abstract

It is an important task to tune performance for sparse matrix vector multiplication (SpMV), but it is also a difficult task because of its irregularity. In this paper, we propose a cache blocking method to improve the performance of SpMV on the emerging GPU architecture. The sparse matrix is partitioned into many sub-blocks, which are stored in CSR format. With the blocking method, the corresponding part of vector x can be reused in the GPU cache, so the time spent on accessing the global memory for vector x is reduced heavily. Experimental results on GeForce GTX 480 show that SpMV kernel with the cache blocking method is 5x faster than the unblocked CSR kernel in the best case.

Full Text