Abstract

Recent research on video human action has progressed with the development of 3-demensional deep convolutional networks (3-D ConNets). In particular, spatiotemporal features exhibited improved performance. However, the temporal information, which commonly exists in video, has not been fully exploited in existing 3-D ConNets. In this paper, we propose a novel Residual Non-degenerate Temporal Network (RNTN) for human action recognition, which can exploit sufficiently temporal information from frames. Specially, RNTN mainly consists of residual nondegenerate temporal blocks (RNTB) and 3-D effective channel attention blocks (3D-ECA). In RNTB, the expression of temporal features is enhanced effectively. In 3D-ECA, the potential connection between features was strengthened by channel feature interactive with the adjacent channel features. Our approach provides the state-of-the-art performance on the datasets of UCF-101(98.33%) and HMDB-51(80.04%).

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call