S-CNN-ESystem: An end-to-end embedded CNN inference system with low hardware cost and hardware-software time-balancing

Wenjie Wang,Xiaotong Chi,Huadong Xu,Minghua Zhu

doi:10.1016/j.sysarc.2021.102122

Abstract

Abstract In recent years, a variety of dedicated hardware accelerators have been proposed using field programmable gate arrays (FPGAs) to accelerate convolutional neural networks (CNNs) on embedded platforms. However, the speed of dedicated hardware has always exceeded that of data acquisition/loading of software. Therefore, the hardware can stay idle due to a long wait for the software, which will lead to less benefits even when utilizing hardware acceleration. In this paper, a hardware-reusing strategy is proposed based on the structural characteristics of CNNs. Furthermore, a heuristic hardware/software time-balancing flow is introduced with the help of the hardware-reusing to narrow and even eliminate the imbalance of hardware-software time in the low-cost end-to-end embedded CNN inference system. On the premise that the system’s time and accuracy requirements are met, the hardware speed is optimized in hardware design to achieve the hardware/software time-balancing. Based on these two strategies, an end-to-end embedded CNN inference system with low hardware cost and hardware/software time-balancing (S-CNN-ESystem) is suggested. Evaluations have been performed on the Zynq xc7z020-clg400-1 platform and LeNet-5, AlexNet and VGG-16 used to validate our solution. Compared with the accelerators implemented by the pure FPGA, the hardware-to-software time consuming ratio is much closer to 1:1.

Full Text