CUDA-enabled Sparse Matrix–Vector Multiplication on GPUs using atomic operations

Hoang-Vu Dang,Bertil Schmidt

doi:10.1016/j.parco.2013.09.005

Abstract

Existing formats for Sparse Matrix–Vector Multiplication (SpMV) on the GPU are outperforming their corresponding implementations on multi-core CPUs. In this paper, we present a new format called Sliced COO (SCOO) and an efficient CUDA implementation to perform SpMV on the GPU using atomic operations. We compare SCOO performance to existing formats of the NVIDIA Cusp library using large sparse matrices. Our results for single-precision floating-point matrices show that SCOO outperforms the COO and CSR format for all tested matrices and the HYB format for all tested unstructured matrices on a single GPU. Furthermore, our dual-GPU implementation achieves an efficiency of 94% on average. Due to the lower performance of existing CUDA-enabled GPUs for atomic operations on double-precision floating-point numbers the SCOO implementation for double-precision does not consistently outperform the other formats for every unstructured matrix. Overall, the average speedup of SCOO for the tested benchmark dataset is 3.33 (1.56) compared to CSR, 5.25 (2.42) compared to COO, 2.39 (1.37) compared to HYB for single (double) precision on a Tesla C2075. Furthermore, comparison to a Sandy-Bridge CPU shows that SCOO on a Fermi GPU outperforms the multi-threaded CSR implementation of the Intel MKL Library on an i7-2700 K by a factor between 5.5 (2.3) and 18 (12.7) for single (double) precision. Source code is available at https://github.com/danghvu/cudaSpmv.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

CUDA-enabled Sparse Matrix–Vector Multiplication on GPUs using atomic operations

Abstract

Talk to us

Similar Papers

More From: Parallel Computing

Lead the way for us

Journal: Parallel Computing	Publication Date: Oct 7, 2013
Citations: 42

Similar Papers

Large-Scale Sparse Singular Value Computations
Michael W Berry
The International Journal of Supercomputing Applications | VOL. 6
Michael W BerryMichael W Berry
01 Apr 1992
The International Journal of Supercomputing Applications | VOL. 6

COMPUTING EXTREMAL SINGULAR TRIPLETS OF SPARSE MATRICES ON A SHARED-MEMORY MULTIPROCESSOR
M.W Berry ... A.H Sameh
International Journal of High Speed Computing | VOL. 06
M.W Berry, et. al.M.W Berry ... A.H Sameh
01 Jun 1994
International Journal of High Speed Computing | VOL. 06

The Sliced COO Format for Sparse Matrix-Vector Multiplication on CUDA-enabled GPUs
Hoang-Vu Dang ... Bertil Schmidt
Procedia Computer Science | VOL. 9
Hoang-Vu Dang, et. al.Hoang-Vu Dang ... Bertil Schmidt
01 Jan 2012
Procedia Computer Science | VOL. 9

Sparse matrix partitioning for optimizing SpMV on CPU-GPU heterogeneous platforms
Akrem Benatia ... Weixing Ji
The International Journal of High Performance Computing Applications | VOL. 34
Akrem Benatia, et. al.Akrem Benatia ... Weixing Ji
14 Nov 2019
The International Journal of High Performance Computing Applications | VOL. 34

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

CUDA-enabled Sparse Matrix–Vector Multiplication on GPUs using atomic operations

Abstract

Talk to us

Similar Papers

More From: Parallel Computing