Sparse Matrix-Vector Multiplication on GPGPUs

Salvatore Filippone,Alessandro Fanfarillo,Valeria Cardellini,Davide Barbieri

doi:10.1145/3017994

Abstract

The multiplication of a sparse matrix by a dense vector (SpMV) is a centerpiece of scientific computing applications: it is the essential kernel for the solution of sparse linear systems and sparse eigenvalue problems by iterative methods. The efficient implementation of the sparse matrix-vector multiplication is therefore crucial and has been the subject of an immense amount of research, with interest renewed with every major new trend in high-performance computing architectures. The introduction of General-Purpose Graphics Processing Units (GPGPUs) is no exception, and many articles have been devoted to this problem.With this article, we provide a review of the techniques for implementing the SpMV kernel on GPGPUs that have appeared in the literature of the last few years. We discuss the issues and tradeoffs that have been encountered by the various researchers, and a list of solutions, organized in categories according to common features. We also provide a performance comparison across different GPGPU models and on a set of test matrices coming from various application domains.

Highlights

The topic we are about to discuss is a single, apparently very simple, computational kernel: the multiplication of a vector by a sparse matrix
The NVIDIA General Purpose Graphics Processing Units (GPGPUs) architectural model is based on a scalable array of multithreaded, streaming multi-processors, each composed of a fixed number of scalar processors, one or more instruction fetch units, on-chip fast memory, which is partitioned into shared memory and L1 cache on older Fermi and Kepler generations but is physically separated on newer Maxwell and Pascal, plus additional special-function hardware
The seminal work on accelerating the sparse matrix by a dense vector (SpMV) kernel on CUDA-enabled GPGPUs was presented by Bell and Garland in [13, 14], who provided a detailed study of sparse matrix formats and their access pattern on GPGPU and implemented CUDA kernels for the main classic storage formats, including COO, Compressed Sparse Rows (CSR), and ELL

Summary

Introduction

The topic we are about to discuss is a single, apparently very simple, computational kernel: the multiplication of a vector by a sparse matrix. GPGPUs appear good candidates for scientific computing applications, and they have attracted much interest for operations on sparse matrices, such as the matrix-vector multiplication; many researchers have taken interest in the SpMV kernel, as witnessed for example by the works [11, 14, 22, 27, 56, 78, 85, 96], and the development of CUSP [24] and NVIDIA’s cuSPARSE [82] libraries.

Storage Formats for Sparse Matrices

COOrdinate

Compressed Sparse Rows

Compressed Sparse Columns

Storage Formats for Vector Computers

GPGPUs

A Survey of Sparse Matrix Formats on GPGPUs

COO Variants

CSR Variants

CSR Optimizations

New Formats Based on CSR

CSC Variants

ELLPACK Variants

DIA Variants

Hybrid Variants

New GPGPU-specific Storage Formats

Automated Tuning and Performance Optimization

Experimental Evaluation

Sparse Storage Formats The set of storage formats includes:

10 Graph matrices — web roadNet-PA

Hardware and Software Platforms

SpMV Performance

Roofline Models

SpMV Overheads

Lessons Learned

Findings

Conclusions

Full Text

Published version (

Free)

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: ACM Transactions on Mathematical Software	Publication Date: Jan 9, 2017
Citations: 96	License type: cc-by

R Discovery Prime

R Discovery Prime

Sparse Matrix-Vector Multiplication on GPGPUs

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: ACM Transactions on Mathematical Software

Lead the way for us

Similar Papers

The BiConjugate gradient method on GPUs
G. Ortega ... E. M. Garzón
The Journal of Supercomputing | VOL. 64
G. Ortega, et. al.G. Ortega ... E. M. Garzón
17 Apr 2012
The Journal of Supercomputing | VOL. 64

Parallel Iterative Linear Solvers on GPU: A Financial Engineering Case
Abhijeet Gaikwad ... Ioane Muni Toke
-
Abhijeet Gaikwad, et. al.Abhijeet Gaikwad ... Ioane Muni Toke
01 Feb 2010
01 Feb 2010

Solving large-scale sparse eigenvalue problems and linear systems of equations for accelerator modeling
Gene Golub ... Kwok Ko
-
Gene Golub, et. al. Gene Golub ... Kwok Ko
30 Mar 2009
30 Mar 2009

Efficient approximate solution of sparse linear systems
J.H Reif
Computers & mathematics with applications (Oxford, England : 1987) | VOL. 36
J.H ReifJ.H Reif
01 Nov 1998
Computers & mathematics with applications (Oxford, England : 1987) | VOL. 36

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Sparse Matrix-Vector Multiplication on GPGPUs

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: ACM Transactions on Mathematical Software