A model-driven blocking strategy for load balanced sparse matrix–vector multiplication on GPUs

Arash Ashari,Naser Sedaghati,John Eisenlohr,P Sadayappan

doi:10.1016/j.jpdc.2014.11.001

Abstract

Sparse Matrix–Vector multiplication (SpMV) is one of the key operations in linear algebra. Overcoming thread divergence, load imbalance and un-coalesced and indirect memory access due to sparsity and irregularity are challenges to optimizing SpMV on GPUs.In this paper we present a new Blocked Row–Column (BRC) storage format with a two-dimensional blocking mechanism that addresses these challenges effectively. It reduces thread divergence by reordering and blocking rows of the input matrix with nearly equal number of non-zero elements onto the same execution units (i.e., warps). BRC improves load balance by partitioning rows into blocks with a constant number of non-zeros such that different warps perform the same amount of work. We also present an approach to optimize BRC performance by judicious selection of block size based on sparsity characteristics of the matrix.A CUDA implementation of BRC outperforms NVIDIA CUSP and cuSPARSE libraries and other state-of-the-art SpMV formats on a range of unstructured sparse matrices from multiple application domains. The BRC format has been integrated with PETSc, enabling its use in PETSc’s solvers. Furthermore, when partitioning the input matrix, BRC achieves near linear speedup on multiple GPUs.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

A model-driven blocking strategy for load balanced sparse matrix–vector multiplication on GPUs

Abstract

Talk to us

Similar Papers

More From: Journal of Parallel and Distributed Computing

Lead the way for us

Journal: Journal of Parallel and Distributed Computing	Publication Date: Nov 12, 2014
Citations: 23

Similar Papers

An efficient two-dimensional blocking strategy for sparse matrix-vector multiplication on GPUs
Arash Ashari ... P Sadayappan
-
Arash Ashari, et. al.Arash Ashari ... P Sadayappan
10 Jun 2014
10 Jun 2014

Optimizing Sparse Matrix–Vector Multiplications on an ARMv8-based Many-Core Architecture
Donglin Chen ... Chuanfu Xu
International Journal of Parallel Programming | VOL. 47
Donglin Chen, et. al.Donglin Chen ... Chuanfu Xu
01 Jan 2019
International Journal of Parallel Programming | VOL. 47

Compiler transformation to generate hybrid sparse computations
...
-
, et. al. ...
13 Nov 2016
13 Nov 2016

Evaluate Metadata of Sparse Matrix for SpMV on Shared Memory Architecture
Nazmul Ahasan Maruf ... Waseem Ahmed
International Journal of Advanced Computer Science and Applications | VOL. 10
Nazmul Ahasan Maruf, et. al.Nazmul Ahasan Maruf ... Waseem Ahmed
01 Jan 2019
International Journal of Advanced Computer Science and Applications | VOL. 10

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

A model-driven blocking strategy for load balanced sparse matrix–vector multiplication on GPUs

Abstract

Talk to us

Similar Papers

More From: Journal of Parallel and Distributed Computing