This paper details the development of a hardware acceleration system for YOLOv5, focusing on flame detection as its primary application. The implementation leverages the APU and DPU functionalities integrated into the Zynq UltraScale+ MPSoC XCZU7EV core. The proposed solution addresses the challenge of achieving real-time target detection on mobile terminals, ensuring both real-time operation and ultralow power consumption of YOLOv5. Notably, our design approach facilitates the deployment of all target detection algorithms under TensorFlow for mobile devices. To optimize model efficiency, we employ saturated linear mapping quantization with calibration. This technique maps model weights, double bases, and activations from 32-bit to 8-bit, incurring only a 1.64% accuracy loss. The data flow design is realized through efficient data exchange between DDR, APU, and DPU, utilizing the AXI4 bus architecture. Image pre-processing and post-processing tasks are executed on the APU, while neural network inference occurs on the DPU. Our accelerated system demonstrates compelling experimental results: maintaining a detection speed of 56FPS, achieving an accuracy of 36.56% on the COCO2014 dataset, and exhibiting a total system power consumption of only 4.147W. Furthermore, the energy consumption ratio is measured at 15.41GOPS/W, surpassing the RTX A6000 graphics card by a factor of 55.
Read full abstract