Research on FPGA High-Performance Implementation Method of CNN

Xiangche Zhen,Bin He

doi:10.1109/icsp51882.2021.9408954

Abstract

Convolutional Neural Networks (CNN) are widely used in such fields as image recognition, object detection and image segmentation. These application scenarios have high requirements on real-time data processing capabilities. With the advantages of good flexibility, strong parallel processing ability and low power consumption, Field Programmable Gate Array (FPGA) is very suitable as a hardware platform for processing CNN. Based on the CNN model Lenet-5, this study will explore the high-performance implementation methods for using CNN in image recognition, and further design and implement a CNN hardware accelerator IP core based on FPGA. First, it uses Pytorch to train the network model and extract the weight parameters; then, it uses C language in the Xilinx High-level Synthesis (HLS) to model the forward inference structure of the CNN. The model designed is then optimized with different optimization strategies, including storage structure optimization, pipeline optimization and fixed-point quantization, so as to improve the performance of the hardware accelerator. Finally, the optimized design model is converted into a hardware accelerator through HLS. The experimental results demonstrate that the implementation performance of the CNN accelerator with different optimization strategies can be greatly improved, and it can meet the requirements of high-real-time application scenarios of CNN.

Full Text