Adaptive Multi-level Blocking Optimization for Sparse Matrix Vector Multiplication on GPU

Yusuke Nagasaka,Akira Nukada,Satoshi Matsuoka

doi:10.1016/j.procs.2016.05.304

Abstract

Sparse matrix vector multiplication (SpMV) is the dominant kernel in scientific simulations. Many-core processors such as GPUs accelerate SpMV computations with high parallelism and memory bandwidth compared to CPUs; however, even for many-core processors the performance of SpMV is still strongly limited by memory bandwidth and lower locality of memory access to input vector causes further performance degradation. We propose a new sparse matrix format called the Adaptive Multi-level Blocking (AMB) format, which aggressively reduces the memory traffic in SpMV computation to improve performance. By several optimization techniques such as division and blocking of the given matrix, the column indices are compressed and the reusability of input vector element in the cache is highly improved. An auto-tuning mechanism determines the best set of parameters for each matrix data by estimating the memory traffic and predicting the performance of a given SpMV computation. For 32 matrix datasets taken from the Sparse Matrix Collection collected by the University of Florida, AMB format achieves speedups of up to x2.92 compared to NVIDIA's cuSparse library and up to x1.40 compared to yaSpMV, which was recently proposed and has been the best known library to date for fast SpMV computation.

Full Text