Abstract

Abstract In recent years, a variety of dedicated hardware accelerators have been proposed using field programmable gate arrays (FPGAs) to accelerate convolutional neural networks (CNNs) on embedded platforms. However, the speed of dedicated hardware has always exceeded that of data acquisition/loading of software. Therefore, the hardware can stay idle due to a long wait for the software, which will lead to less benefits even when utilizing hardware acceleration. In this paper, a hardware-reusing strategy is proposed based on the structural characteristics of CNNs. Furthermore, a heuristic hardware/software time-balancing flow is introduced with the help of the hardware-reusing to narrow and even eliminate the imbalance of hardware-software time in the low-cost end-to-end embedded CNN inference system. On the premise that the system’s time and accuracy requirements are met, the hardware speed is optimized in hardware design to achieve the hardware/software time-balancing. Based on these two strategies, an end-to-end embedded CNN inference system with low hardware cost and hardware/software time-balancing (S-CNN-ESystem) is suggested. Evaluations have been performed on the Zynq xc7z020-clg400-1 platform and LeNet-5, AlexNet and VGG-16 used to validate our solution. Compared with the accelerators implemented by the pure FPGA, the hardware-to-software time consuming ratio is much closer to 1:1.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.