End-to-end temporal attention extraction and human action recognition

Hong Zhang,Lei Zhang,Helong Wang,Shuhang Wang,Miao Xin,Yifan Yang

doi:10.1007/s00138-018-0956-5

Abstract

Visual context is fundamental to understand human actions in videos. However, the discriminative temporal information of videos is usually sparse and most frames are redundant mixed with a large amount of interference information, which may result in redundant computation and recognition failure. Hence, an important question is how to efficiently employ temporal context information. In this paper, we propose a learnable temporal attention mechanism to automatically select important time points from action sequences. We design an unsupervised Recurrent Temporal Sparse Autoencoder (RTSAE) network, which learns to extract sparse keyframes to sharpen discriminative yet to retain descriptive capability, as well to shield interference information. By applying this technique to a dual-stream convolutional neural network, we significantly improve the performance in both accuracy and efficiency. Experiments demonstrate that, with the help of the RTSAE, our method achieves competitive results to state of the art on UCF101 and HMDB51 datasets.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

End-to-end temporal attention extraction and human action recognition

Abstract

Talk to us

Similar Papers

More From: Machine Vision and Applications

Lead the way for us

Journal: Machine Vision and Applications	Publication Date: Jul 18, 2018
Citations: 9

Similar Papers

Recurrent Temporal Sparse Autoencoder for attention-based action recognition
Miao Xin ... Ding Yuan
-
Miao Xin, et. al.Miao Xin ... Ding Yuan
01 Jul 2016
01 Jul 2016

Channel Attention-Based Approach with Autoencoder Network for Human Action Recognition in Low-Resolution Frames
Elaheh Dastbaravardeh ... Khosro Rezaee
International Journal of Intelligent Systems | VOL. 2024
Elaheh Dastbaravardeh, et. al.Elaheh Dastbaravardeh ... Khosro Rezaee
04 Jan 2024
International Journal of Intelligent Systems | VOL. 2024

Automatic annotation of human actions in video
Olivier Duchenne ... Jean Ponce
-
Olivier Duchenne, et. al.Olivier Duchenne ... Jean Ponce
01 Sep 2009
01 Sep 2009

Faster and Accurate Compressed Video Action Recognition Straight from the Frequency Domain
Samuel Felipe Dos Santos ... Jurandy Almeida
-
Samuel Felipe Dos Santos, et. al.Samuel Felipe Dos Santos ... Jurandy Almeida
01 Nov 2020
01 Nov 2020

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

End-to-end temporal attention extraction and human action recognition

Abstract

Talk to us

Similar Papers

More From: Machine Vision and Applications