Abstract

The multiplication of a sparse matrix by a dense vector (SpMV) is a centerpiece of scientific computing applications: it is the essential kernel for the solution of sparse linear systems and sparse eigenvalue problems by iterative methods. The efficient implementation of the sparse matrix-vector multiplication is therefore crucial and has been the subject of an immense amount of research, with interest renewed with every major new trend in high-performance computing architectures. The introduction of General-Purpose Graphics Processing Units (GPGPUs) is no exception, and many articles have been devoted to this problem.With this article, we provide a review of the techniques for implementing the SpMV kernel on GPGPUs that have appeared in the literature of the last few years. We discuss the issues and tradeoffs that have been encountered by the various researchers, and a list of solutions, organized in categories according to common features. We also provide a performance comparison across different GPGPU models and on a set of test matrices coming from various application domains.

Highlights

  • The topic we are about to discuss is a single, apparently very simple, computational kernel: the multiplication of a vector by a sparse matrix

  • The NVIDIA General Purpose Graphics Processing Units (GPGPUs) architectural model is based on a scalable array of multithreaded, streaming multi-processors, each composed of a fixed number of scalar processors, one or more instruction fetch units, on-chip fast memory, which is partitioned into shared memory and L1 cache on older Fermi and Kepler generations but is physically separated on newer Maxwell and Pascal, plus additional special-function hardware

  • The seminal work on accelerating the sparse matrix by a dense vector (SpMV) kernel on CUDA-enabled GPGPUs was presented by Bell and Garland in [13, 14], who provided a detailed study of sparse matrix formats and their access pattern on GPGPU and implemented CUDA kernels for the main classic storage formats, including COO, Compressed Sparse Rows (CSR), and ELL

Read more

Summary

Introduction

The topic we are about to discuss is a single, apparently very simple, computational kernel: the multiplication of a vector by a sparse matrix. GPGPUs appear good candidates for scientific computing applications, and they have attracted much interest for operations on sparse matrices, such as the matrix-vector multiplication; many researchers have taken interest in the SpMV kernel, as witnessed for example by the works [11, 14, 22, 27, 56, 78, 85, 96], and the development of CUSP [24] and NVIDIA’s cuSPARSE [82] libraries.

Storage Formats for Sparse Matrices
COOrdinate
Compressed Sparse Rows
Compressed Sparse Columns
Storage Formats for Vector Computers
GPGPUs
A Survey of Sparse Matrix Formats on GPGPUs
COO Variants
CSR Variants
CSR Optimizations
New Formats Based on CSR
CSC Variants
ELLPACK Variants
DIA Variants
Hybrid Variants
New GPGPU-specific Storage Formats
Automated Tuning and Performance Optimization
Experimental Evaluation
Sparse Storage Formats The set of storage formats includes:
10 Graph matrices — web roadNet-PA
Hardware and Software Platforms
SpMV Performance
Roofline Models
SpMV Overheads
Lessons Learned
Findings
Conclusions
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call