Accelerating Sparse Deep Neural Network Inference Using GPU Tensor Cores

Yufei Sun,Xiangyu Ye,Pengcheng Yao,Yu Huang,Xiaofei Liao,Qinggang Wang,Hai Jin,Long Zheng

doi:10.1109/hpec55821.2022.9926300

Yufei Sun, Xiangyu Ye + Show 6 more

https://doi.org/10.1109/hpec55821.2022.9926300

Copy DOI

Export

Save

Cite

Abstract
Full-Text
Similar Papers

Abstract

Listen

Sparse deep neural networks (SpDNN) attract a lot of research and industry attention because of their powerful learning capability, whose execution time is dominated by the sparse matrix-dense matrix multiplication (SpMM). As one of specialized processors for matrix multiplication, NVIDIA GPU Tensor Cores can perform half-precision matrix-matrix multiplication with higher performance than CUDA Cores, which provides great op-portunities for SpMM acceleration. However, performing SpMM efficiently on Tensor Cores remains tremendously challenging. First, typical Tensor Cores do not handle extremely sparse matrix computations well, delivering much lower performance compared to the dense counterparts. Second, the single-precision Challenge dataset prevents them from leveraging powerful Tensor Cores to improve performance. To this end, we first propose a similarity-based matrix transformation scheme, which polarizes the weight matrix to be either denser or sparser in local regions. Then the denser and sparser workloads are respectively processed on Tensor Cores and CUDA Cores, boosting the overall efficiency. Second, considering the half-precision limitation of Tensor Cores, we further propose a lightweight emulation algorithm to achieve the single-precision computation on Tensor Cores without affecting the correctness of final results. To the best of our knowl-edge, this paper is the first to accelerate SpDNN inference on Tensor Cores without compromising the precision requirement. Extensive experiments validate that our work reaches up to 300 TeraEdges per second inference throughput on a single A100 GPU, yielding up to 89.41x and 8.12x speedups against the champions of the 2020 and 2021 Sparse Deep Neural Network Graph Challenge, respectively. Moreover, our 4-GPU version are also up to 6.56 x faster over the 2021 champion running on 4 GPUs and 7.55x faster over the 2020 champion running on 768 GPUs.

Full Text