Efficient SpMM Accelerator for Deep Learning: Sparkle and Its Automated Generator

Shiyao Xu,Jingfei Jiang,Xifu Qian,Jinwei Xu

doi:10.1145/3665896

Abstract

Deep learning (DL) technology has made breakthroughs in a wide range of intelligent tasks, such as vision, language, recommendation systems, and so on. Sparse matrix multiplication (SpMM) is the key computation kernel of most sparse models. Conventional computing platforms, such as CPUs, GPUs, and AI chips with regular processing units, are unable to effectively support sparse computation due to their fixed structure and instruction sets. This work extends Sparkle, an accelerator architecture, which is developed specifically for processing SpMM in DL. During the balanced data loading process, some modifications are implemented to enhance the flexibility of the Sparkle architecture. Additionally, a Sparkle generator is proposed to accommodate diverse resource constraints and facilitate adaptable deployment. Leveraging Sparkle’s structural parameters and template-based design methods, the generator enables automatic Sparkle circuit generation under varying parameters. An instantiated Sparkle accelerator is implemented on the Xilinx xqvu11p FPGA platform with a specific configuration. Compared to the state-of-the-art SpMM accelerator SIGMA, the Sparkle accelerator instance improves the sparse computing efficiency by about 10 to 20 \(\%\) . Furthermore, the Sparkle instance achieved 7.76 \(\times\) higher performance over the Nvidia Orin NX GPU. More instances of accelerators with different parameters were evaluated, demonstrating that the Sparkle architecture can effectively accelerate SpMM.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Efficient SpMM Accelerator for Deep Learning: Sparkle and Its Automated Generator

Abstract

Talk to us

Similar Papers

More From: ACM Transactions on Reconfigurable Technology and Systems

Lead the way for us

Similar Papers

Sparse Matrix to Matrix Multiplication: A Representation and Architecture for Acceleration
Sharad Malik ... Pareesa Ameneh Golnari
-
Sharad Malik, et. al.Sharad Malik ... Pareesa Ameneh Golnari
01 Jul 2019
01 Jul 2019

FPGA-Based Sparse Matrix Multiplication Accelerators: From State-of-the-art to Future Opportunities
Yajing Liu ... Shuyang Li
ACM Transactions on Reconfigurable Technology and Systems | VOL. -
Yajing Liu, et. al.Yajing Liu ... Shuyang Li
28 Aug 2024
ACM Transactions on Reconfigurable Technology and Systems | VOL. -

Face Recognition Technology Analysis Based on Deep Learning Algorithm
Li Liang
Journal of Physics: Conference Series | VOL. 1544
Li LiangLi Liang
01 May 2020
Journal of Physics: Conference Series | VOL. 1544

A Survey of Recommender Systems Based on Deep Learning
Ruihui Mu
IEEE Access | VOL. 6
Ruihui MuRuihui Mu
01 Jan 2018
IEEE Access | VOL. 6

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Efficient SpMM Accelerator for Deep Learning: Sparkle and Its Automated Generator

Abstract

Talk to us

Similar Papers

More From: ACM Transactions on Reconfigurable Technology and Systems