A Sparse CNN Accelerator for Eliminating Redundant Computations in Intra- and Inter-Convolutional/Pooling Layers

Chen Yang,Kuizhi Mei,Kaibo Huo,Jiawei Xi,Yishuo Meng

doi:10.1109/tvlsi.2022.3211665

Abstract

Neural network pruning, which can be divided into unstructured pruning and structured pruning strategies, has been proven to be an efficient method to substantially reduce the number of computations of convolutional neural networks (CNNs). However, it remains difficult to combine the advantages of these two pruning strategies. This article proposes a high-performance accelerator for unstructured sparse CNNs. First, a convolution-based filter selection and clustering method (FSCM) is proposed to reorder unstructured sparse filters into uniform-size dense filters, eliminating redundant computations in the convolutional layers while maintaining a regular structure. In addition, a convolution and pooling calculation method (CPCM) is presented to reduce interlayers’ computational redundancy. Third, a hardware accelerator with high digital signal processing (DSP) efficiency is designed to take advantage of FSCM and CPCM. The proposed accelerator is implemented on a Xilinx XCVU9P platform at 300 MHz. With different CNN configurations, the performance and DSP efficiency for LeNet, AlexNet, and VGG16 are 1490.53 GOPS and 3.882, 913.85 GOPS and 1.785, and 862.16 GOPS and 1.68, respectively. Compared to previous field-programmable gate array (FPGA) and application-specific integrated circuit (ASIC) designs, the accelerator achieves up to <inline-formula xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink"> <tex-math notation="LaTeX">$15.43\times $ </tex-math></inline-formula> and <inline-formula xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink"> <tex-math notation="LaTeX">$7.73\times $ </tex-math></inline-formula> speedup in terms of performance and DSP efficiency.

Full Text