Abstract

Deep convolutional neural networks (DCNNs) have recently emerged as a promising approach for computer vision tasks with many new DCNN architectures proposed to further improve their performance. However, the significant computation workload limits the deployment of such networks on embedded devices. Research on accelerating DCNN inference usually employs field-programmable gate arrays (FPGAs) due to their programmability. However, hardware efficiency and reconfigurability do not often receive sufficient attention. This paper proposes an efficient accelerator to support multiple DCNNs and improve the hardware utilization from three perspectives. First, a bandwidth-based tiling algorithm is used to improve the data transfer efficiency for direct memory access (DMA). Second, three parallel strategies are proposed to improve the utilization of the computing units (CUs). Third, a configurable CU is designed to improve the digital signal processor (DSP) utilization. The proposed accelerator is implemented on the Xilinx ZYNQ-7 ZC706 Evaluation Board at 200 MHz. The accelerator reaches 163 Giga Operations Per Second (GOPS) and 0.36 GOPS/DSP on the VGG-16 while consuming only 448 DSPs. A 0.24 GOPS/DSP is achieved with ResNet50 and 0.27 GOPS/DSP with YOLOv2-tiny. The experimental results demonstrate that this design achieves a better trade-off between hardware resource consumption, performance, and reconfigurability over previous works.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.