Abstract

Convolutional Neural Networks (CNN) are widely used in such fields as image recognition, object detection and image segmentation. These application scenarios have high requirements on real-time data processing capabilities. With the advantages of good flexibility, strong parallel processing ability and low power consumption, Field Programmable Gate Array (FPGA) is very suitable as a hardware platform for processing CNN. Based on the CNN model Lenet-5, this study will explore the high-performance implementation methods for using CNN in image recognition, and further design and implement a CNN hardware accelerator IP core based on FPGA. First, it uses Pytorch to train the network model and extract the weight parameters; then, it uses C language in the Xilinx High-level Synthesis (HLS) to model the forward inference structure of the CNN. The model designed is then optimized with different optimization strategies, including storage structure optimization, pipeline optimization and fixed-point quantization, so as to improve the performance of the hardware accelerator. Finally, the optimized design model is converted into a hardware accelerator through HLS. The experimental results demonstrate that the implementation performance of the CNN accelerator with different optimization strategies can be greatly improved, and it can meet the requirements of high-real-time application scenarios of CNN.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.