Abstract

In recent years, three-dimensional convolutional neural network (3D CNN) has been widely used in the fields of action recognition and video analysis. The general purpose processors are difficult to achieve efficient and intensive computing, and the deployment of 3D CNN based on FPGA has the advantages of low power consumption, high energy efficiency, and customizability, and has gradually become a hot choice for deploying convolutional neural networks in many embedded scenarios. This paper designs a small 3D convolutional neural network based on the classic 3D convolutional neural network C3D, and uses the general matrix multiplication (GEMM) to map the 3D convolution calculation to the 2D matrix multiplication calculation. The matrix is divided into blocks and transmitted to the FPGA through the AXI bus, and the multiplication operation of the block matrix is realized through a two-dimensional multiply-accumulate array. A System on Chip (SoC) architecture is built on the PYNQ platform, using ARM Cortex-A9 as the process control core, and the calculation of the entire matrix is completed under reasonable block and scheduling on the ARM. The IP core of matrix calculation is designed using High Level Synthesis (HLS), and the corresponding parallel optimization scheme is given. Experiments have verified that the prototype design of the hardware acceleration circuit achieves low power consumption, high energy efficiency, and high precision motion recognition while using less hardware resources.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call