Abstract

Convolution Neural Network (CNN) models have shown their dominance in computer vision tasks. Recently, a special convolution block, named MBConv block or inverted residual block, is proposed to construct CNNs to meet the real-time requirements on resource-constrained edge-computing platforms. The MBConv block is first proposed by MobileNetV2 and has been widely used to construct lightweight CNNs. However, the MBConv block brings new challenges to the structure of the computing engine, the bandwidth requirement of off-chip memory and the demand for on-chip memory when designing hardware accelerators. In this paper, a convolution Block Oriented Accelerator (BOA) is proposed for the inference of CNNs constructed on MBConv blocks. In BOA, the MBConv-based CNNs are performed block by block using a Block-Based Engine which consists of dedicated computing units for each layer of the MBConv block. To reduce both the bandwidth requirement of off-chip memory and the demand for on-chip memory, a two-level data flow optimization and an amortized weight loading method are proposed. Furthermore, a hierarchical scheduling scheme is proposed to improve the performance and flexibility so that BOA can guarantee all units running in parallel and support various MBConv-based CNNs. Finally, we deploy BOA on Xilinx VC709. We evaluate the accelerator on ImageNet for image classification. The results show that BOA can perform various MBConv-based CNNs and achieve 1.28x - 7.75x speedup on inference latency.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call