Abstract

Human action recognition (HAR) has been widely employed in various applications such as autonomous cars and intelligent video surveillance. Among the algorithms proposed for HAR, the 3D-CNNs algorithm can achieve the best accuracy. However, its algorithmic complexity imposes a substantial overhead over the speed of these networks, which limits their deployment in real-life applications. This paper proposes a novel customizable architecture for 3D-CNNs based on block floating-point (BFP) arithmetic, where the utilization of BFP significantly reduces the bitwidth while eliminating the need to retrain the network. Optimizations such as locality exploration and block alignment with 3D blocking are performed to improve performance and accuracy. An analytical model and tool are developed to predict the optimized parameters for hardware customization based on user constraints such as FPGA resources or accuracy requirement. Experiments show that without retraining, a 15-bit mantissa design using single-precision accumulation on a Xilinx ZC706 device can be 8.2 times faster than an Intel i7-950 processor at 3.07 GHz with only 0.4% accuracy loss.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.