Abstract

Sparse-matrix dense-matrix multiplication (SpMM) receives one sparse matrix and one dense matrix as two inputs, and outputs one dense matrix as a result. It plays a vital role in various fields such as deep neural networks graph neural networks and analysis. CUDA, NVIDIA's parallel computing platform, provides cuSPARSE library to support Basic Linear Algebra Subroutines (BLAS) with sparse matrices such as SpMM. In sparse matrices, zero values can be discarded from storage or computations to accelerate execution. In order to represent only non-zero values in sparse matrices, the cuSPARSE library supports several sparse formats for matrices such as COO (COOrdinate), CSR (Compressed Sparse Row), and CSC (Compressed Sparse Column). In addition, since the 3rd Gen. Tensor Cores with Ampere was introduced, CUDA provides cuSPARSELt library for SpMM whose sparse matrix satisfies a 2:4 sparsity pattern, which is approximately 50% sparsity that can occur in machine learning, etc. In this paper, we compare the cuSPARSE library and the cuSPARSELt library for SpMM, in the case of sparse matrices with a 2:4 sparsity pattern(50% sparsity). Furthermore, we compare the performances of three formats to perform SpMM in the cuSPARSE library, in terms of different sparsity such as 75% sparsity, 87.5% sparsity and 99% sparsity.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call