A sparse convolution neural network accelerator with bandwidth-efficient data loopback structure

Haiying Yuan,Zhiyong Zeng

doi:10.1016/j.micpro.2023.104810

Abstract

Convolutional neural network has become one of the mainstream algorithms of deep learning with high computational resource consumption. The model inferencing delay and power consumption can be effectively reduced by the sparsity-aware accelerator with low resource overhead and almost lossless accuracy. However, the computation fragmentation problem caused by sparse operations will greatly reduce convolution operation efficiency. In order to alleviate the above problems, an input channel expansion method is proposed to improve the resource utilization rate in this paper. Considering that bandwidth often becomes the bottleneck of accelerator performance, a bandwidth-efficient data loopback structure is designed to reduce data transmission between accelerator and off-chip memory. The proposed hardware architecture is implemented on Xilinx VC709. It contains up to 1024 multiplication and accumulation units, providing 409.6 GOP/s peak computing power. Its computation speed reaches 315.8 GOP/s in VGG-16 model, which is equivalent to 788 GOP/s of accelerator without sparse activation optimization. 54.2% of activation data transmission is canceled through data loopback bus, which eases the dependence on off-chip bandwidth. This flexible sparsity-aware accelerator architecture can be widely applied to deep convolutional neural networks for large-scale inferencing.

Full Text