Abstract

The hardware design of 3D Convolution Neural Networks (CNNs) requires massive compute and memory due to an additional temporal dimension. This paper explores various design parameters for 3D CNN that enable an efficient implementation of such complex network on resource-limited platforms. A new Inception-based 3D CNN model, the I3D has been chosen for investigating and optimizing its design parameters. The I3D model is a deep network with over 70 layers, and it is used for action recognition in videos. The complexity of this model is first reduced by adjusting the word lengths of feature maps and weights in a pre-trained model while retaining a negligible drop in accuracy. Second, a data tiling technique is proposed that exploits five dimensions of a video data volume to obtain improved memory bandwidth and reduced Dynamic Random Access Memory (DRAM) accesses. Finally, based on these optimizations, complete architecture of a Field Programmable Gate Array (FPGA) based hardware accelerator is proposed that can achieve a throughput of 684 GOPs/s using 32-bit floating point (FP) and 1.29 TOPs/s for 8-bit integer implementations with only 2% drop in accuracy.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call