High Power-Efficient and Performance-Density FPGA Accelerator for CNN-Based Object Detection

Gang Zhang,Chaofan Zhang,Fulin Tang,Yong Liu,Yihong Wu,Xuezhi Yang,Fan Wang

doi:10.1007/978-3-030-88004-0_10

Abstract

The Field Programmable Gate Array (FPGA) accelerator for CNN-based object detection has been attracting widespread attention in computer vision. For most existing FPGA accelerators, the inference accuracy and speed are affected negatively by the low power-efficient and performance-density. To address this problem, we propose a software and hardware co-designed FPGA accelerator for accurate and fast object detection with high power-efficient and performance-density. To develop the FPGA accelerator on CPU+FPGA heterogeneous platforms, a resource sensitive and energy aware FPGA accelerator framework is designed. In hardware, a hardware sensitive neural network quantization called Dynamic Fixed-point Data Quantization (DFDQ) is proposed to improve the power-efficient. In software, an algorithm-level convolution (CONV) optimization scheme is further proposed to improve the performance-density by paralleling block execution of CONV cores. To validate the proposed FPGA accelerator, a Zynq FPGA is used to build the acceleration platform of You Only Look Once (YOLO) network. Results demonstrate that the proposed FPGA accelerator outperforms the state-of-the-art methods in power-efficient and performance-density. Besides, the speed of object detection is increased by at most 16.5 times along with less than 1.5% accuracy degradation.

Full Text