Reconfigurable Acceleration of 3D-CNNs for Human Action Recognition with Block Floating-Point Representation

Hongxiang Fan,Shuanglong Liu,Xinyu Niu,Ho-Cheung Ng,Wayne Luk,Zhiqiang Que

doi:10.1109/fpl.2018.00056

Abstract

Human action recognition (HAR) has been widely employed in various applications such as autonomous cars and intelligent video surveillance. Among the algorithms proposed for HAR, the 3D-CNNs algorithm can achieve the best accuracy. However, its algorithmic complexity imposes a substantial overhead over the speed of these networks, which limits their deployment in real-life applications. This paper proposes a novel customizable architecture for 3D-CNNs based on block floating-point (BFP) arithmetic, where the utilization of BFP significantly reduces the bitwidth while eliminating the need to retrain the network. Optimizations such as locality exploration and block alignment with 3D blocking are performed to improve performance and accuracy. An analytical model and tool are developed to predict the optimized parameters for hardware customization based on user constraints such as FPGA resources or accuracy requirement. Experiments show that without retraining, a 15-bit mantissa design using single-precision accumulation on a Xilinx ZC706 device can be 8.2 times faster than an Intel i7-950 processor at 3.07 GHz with only 0.4% accuracy loss.

Full Text