A high-performance convolution block oriented accelerator for MBConv-Based CNNs

Zhan Zhang,Kun Zhang,Jiangwei Shang,Chuanyou Li,Hongwei Liu

doi:10.1016/j.vlsi.2022.10.012

Abstract

Convolution Neural Network (CNN) models have shown their dominance in computer vision tasks. Recently, a special convolution block, named MBConv block or inverted residual block, is proposed to construct CNNs to meet the real-time requirements on resource-constrained edge-computing platforms. The MBConv block is first proposed by MobileNetV2 and has been widely used to construct lightweight CNNs. However, the MBConv block brings new challenges to the structure of the computing engine, the bandwidth requirement of off-chip memory and the demand for on-chip memory when designing hardware accelerators. In this paper, a convolution Block Oriented Accelerator (BOA) is proposed for the inference of CNNs constructed on MBConv blocks. In BOA, the MBConv-based CNNs are performed block by block using a Block-Based Engine which consists of dedicated computing units for each layer of the MBConv block. To reduce both the bandwidth requirement of off-chip memory and the demand for on-chip memory, a two-level data flow optimization and an amortized weight loading method are proposed. Furthermore, a hierarchical scheduling scheme is proposed to improve the performance and flexibility so that BOA can guarantee all units running in parallel and support various MBConv-based CNNs. Finally, we deploy BOA on Xilinx VC709. We evaluate the accelerator on ImageNet for image classification. The results show that BOA can perform various MBConv-based CNNs and achieve 1.28x - 7.75x speedup on inference latency.

Full Text