Abstract

Deep neural networks (DNN) have been widely used in many fields. With the ever-increasing model size, the DNN scalability suffers. Sparse deep neural networks (SpDNN) are promising to resolve this problem, but the sparse data makes it difficult to execute efficiently on GPUs due to load imbalance and irregular memory accesses. The recent MIT/IEEE/Amazon GraphChallenge has shown several big advances to fit sparse DNNs into GPUs, but we observe that none of these earlier efforts can be an absolute winner for all dataset cases due to their limited optimization space considerations. In this paper, we identify some new opportunities in optimizing the execution of SpDNN via a comprehensive analysis of previous works. Based on this new large design space of SpDNN, we present sparsity-aware SpMM algorithms that can systematically explore a performance-optimal solution of SpDNN execution on GPU and further generate optimized SpMM kernel implementations. Compared to the 2020 HPEC Sparse DNN Challenge champions, our approach achieves up to 55.6 TeraEdges per second inference throughput with the speedups of up to 13.74_ [1] and 22.29_ [2] on a single NVIDIA V100 GPU. We also show that our approach under 4 GPUs can be superior to the 2020 Challenge Champion using 768 GPUs in many cases. The source codes are available at https://github.com/CGCL-codes/Graphchallenge21.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.