Dynamic Dataflow Scheduling and Computation Mapping Techniques for Efficient Depthwise Separable Convolution Acceleration

Baoting Li,Jie Ren,Hongbin Sun,Nanning Zheng,Xuchong Zhang,Hang Wang,Longjun Liu

doi:10.1109/tcsi.2021.3078541

Baoting Li, Jie Ren + Show 5 more

https://doi.org/10.1109/tcsi.2021.3078541

Copy DOI

Export

Save

Cite

Abstract
Full-Text
Similar Papers

Abstract

Listen

Depthwise separable convolution (DSC) has become one of the essential structures for lightweight convolutional neural networks. Nevertheless, its hardware architecture has not received much attention. Several previous hardware designs incur either high off-chip memory traffic or large on-chip memory usage, and hence have deficiency in terms of hardware efficiency as well as performance. This paper proposes two efficient dynamic design techniques, i.e. adaptive row-based dataflow scheduling and adaptive computation mapping, to achieve a much better trade-off between hardware efficiency and performance for DSC-based lightweight CNN accelerator. The effectiveness and efficiency of the proposed dynamic design techniques have been extensively evaluated using six DSC-based lightweight CNNs. Compared with the reference architectures, the simulation results show the proposed architectural techniques can at least reduce on-chip buffer size by 50.4% and improve the performance of convolution calculation by 1.18× while maintaining the minimum off-chip memory traffic. MobileNetV2 is implemented on Zynq UltraScale+ ZCU102 SoC FPGA, and the results show the proposed accelerator can achieve 381.7 frames per second (fps), which is 1.43× of the reference design, and it can save about 36.3% on-chip buffer size compared with the reference design, while maintaining the same off-chip memory traffic.

Full Text