Accelerating approximate matrix multiplication for near-sparse matrices on GPUs

Xiaoyan Liu,Depei Qian,Ming Dun,Hailong Yang,Yi Liu,Bohong Yin,Zhongzhi Luan

doi:10.1007/s11227-022-04334-5

Abstract

Although the matrix multiplication plays a vital role in computational linear algebra, there are few efficient solutions for matrix multiplication of the near-sparse matrices. The Sparse Approximate Matrix Multiply (SpAMM) is one of the algorithms to fill the performance gap neglected by traditional optimizations for dense/sparse matrix multiplication. However, existing SpAMM algorithms fail to exploit the performance potential of GPUs for acceleration. In this paper, we present cuSpAMM, the first parallel SpAMM algorithm optimized for multiple GPUs. Several performance optimizations have been proposed, including algorithm re-design to adapt to the thread parallelism, blocking strategies for memory access optimization, and the acceleration with the tensor core. In addition, we scale cuSpAMM to run on multiple GPUs with an effective load balance scheme. We evaluate cuSpAMM on both synthesized and real-world datasets on multiple GPUs. The experiment results show that cuSpAMM achieves significant performance speedup compared to vendor optimized cuBLAS and cuSPARSE libraries.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Accelerating approximate matrix multiplication for near-sparse matrices on GPUs

Abstract

Talk to us

Similar Papers

More From: The Journal of Supercomputing

Lead the way for us

Journal: The Journal of Supercomputing	Publication Date: Feb 14, 2022
Citations: 4

Similar Papers

Recovering single precision accuracy from Tensor Cores while surpassing the FP32 theoretical peak performance
Hiroyuki Ootomo ... Rio Yokota
The International Journal of High Performance Computing Applications | VOL. 36
Hiroyuki Ootomo, et. al.Hiroyuki Ootomo ... Rio Yokota
03 Jun 2022
The International Journal of High Performance Computing Applications | VOL. 36

DGEMM Using Tensor Cores, and Its Accurate and Reproducible Versions
Daichi Mukunoki ... Takeshi Ogita
-
Daichi Mukunoki, et. al.Daichi Mukunoki ... Takeshi Ogita
01 Jan 2020
01 Jan 2020

The bit complexity of matrix multiplication and of related computations in linear algebra. The segmented λ algorithms
V.Y Pan
Computers and Mathematics with Applications | VOL. 11
V.Y PanV.Y Pan
01 Sep 1985
Computers and Mathematics with Applications | VOL. 11

Accelerating Sparse Deep Neural Network Inference Using GPU Tensor Cores
Yufei Sun ... Long Zheng
-
Yufei Sun, et. al.Yufei Sun ... Long Zheng
19 Sep 2022
19 Sep 2022

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Accelerating approximate matrix multiplication for near-sparse matrices on GPUs

Abstract

Talk to us

Similar Papers

More From: The Journal of Supercomputing