Abstract

Sparse deep neural networks (SpDNN) attract a lot of research and industry attention because of their powerful learning capability, whose execution time is dominated by the sparse matrix-dense matrix multiplication (SpMM). As one of specialized processors for matrix multiplication, NVIDIA GPU Tensor Cores can perform half-precision matrix-matrix multiplication with higher performance than CUDA Cores, which provides great op-portunities for SpMM acceleration. However, performing SpMM efficiently on Tensor Cores remains tremendously challenging. First, typical Tensor Cores do not handle extremely sparse matrix computations well, delivering much lower performance compared to the dense counterparts. Second, the single-precision Challenge dataset prevents them from leveraging powerful Tensor Cores to improve performance. To this end, we first propose a similarity-based matrix transformation scheme, which polarizes the weight matrix to be either denser or sparser in local regions. Then the denser and sparser workloads are respectively processed on Tensor Cores and CUDA Cores, boosting the overall efficiency. Second, considering the half-precision limitation of Tensor Cores, we further propose a lightweight emulation algorithm to achieve the single-precision computation on Tensor Cores without affecting the correctness of final results. To the best of our knowl-edge, this paper is the first to accelerate SpDNN inference on Tensor Cores without compromising the precision requirement. Extensive experiments validate that our work reaches up to 300 TeraEdges per second inference throughput on a single A100 GPU, yielding up to 89.41x and 8.12x speedups against the champions of the 2020 and 2021 Sparse Deep Neural Network Graph Challenge, respectively. Moreover, our 4-GPU version are also up to 6.56 x faster over the 2021 champion running on 4 GPUs and 7.55x faster over the 2020 champion running on 768 GPUs.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.