Residual Non-degenerate Temporal Network for Human Action Recognition

Shaofeng Ming,Wan Li,Qiang Cai,Xinliang Liu,Haisheng Li,Cui Gao

doi:10.1109/iccc51575.2020.9345117

Abstract

Recent research on video human action has progressed with the development of 3-demensional deep convolutional networks (3-D ConNets). In particular, spatiotemporal features exhibited improved performance. However, the temporal information, which commonly exists in video, has not been fully exploited in existing 3-D ConNets. In this paper, we propose a novel Residual Non-degenerate Temporal Network (RNTN) for human action recognition, which can exploit sufficiently temporal information from frames. Specially, RNTN mainly consists of residual nondegenerate temporal blocks (RNTB) and 3-D effective channel attention blocks (3D-ECA). In RNTB, the expression of temporal features is enhanced effectively. In 3D-ECA, the potential connection between features was strengthened by channel feature interactive with the adjacent channel features. Our approach provides the state-of-the-art performance on the datasets of UCF-101(98.33%) and HMDB-51(80.04%).

Full Text