Abstract

Convolutional neural network has become one of the mainstream algorithms of deep learning with high computational resource consumption. The model inferencing delay and power consumption can be effectively reduced by the sparsity-aware accelerator with low resource overhead and almost lossless accuracy. However, the computation fragmentation problem caused by sparse operations will greatly reduce convolution operation efficiency. In order to alleviate the above problems, an input channel expansion method is proposed to improve the resource utilization rate in this paper. Considering that bandwidth often becomes the bottleneck of accelerator performance, a bandwidth-efficient data loopback structure is designed to reduce data transmission between accelerator and off-chip memory. The proposed hardware architecture is implemented on Xilinx VC709. It contains up to 1024 multiplication and accumulation units, providing 409.6 GOP/s peak computing power. Its computation speed reaches 315.8 GOP/s in VGG-16 model, which is equivalent to 788 GOP/s of accelerator without sparse activation optimization. 54.2% of activation data transmission is canceled through data loopback bus, which eases the dependence on off-chip bandwidth. This flexible sparsity-aware accelerator architecture can be widely applied to deep convolutional neural networks for large-scale inferencing.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.