Two-Level Attention Model Based Video Action Recognition Network

Haifeng Sang,Dakuo He,Ziyu Zhao

doi:10.1109/access.2019.2936628

Haifeng Sang, Dakuo He + Show 1 more

Open Access

https://doi.org/10.1109/access.2019.2936628

Copy DOI

Abstract

The complex environment background, lighting conditions, and other action-irrelevant visual information in the video frame bring a lot of redundancy and noise to the action spatial features, which seriously affects the accuracy of action recognition. Aiming at this point, we propose a recurrent region attention cell to capture the action-relevant regional visual information in the spatial feature, and according to the temporal sequential natures of the video, on the basis of the recurrent region attention cell, a Recurrent Region Attention model (RRA) is proposed. The recurrent region attention cell in the RRA iterates according to the temporal sequence of the video, so that the attention performance of the RRA is gradually improved. Secondly, we propose a Video Frame Attention model (VFA) that can highlight the more important frames in the whole action video sequence, so as to reduce the interference caused by the similarity between the heterogeneous action video sequences. Finally, we propose an end-to-end trainable network: Two-level Attention Model based video action recognition network (TAMNet). We experimented on two video action recognition benchmark datasets: UCF101 and HMDB51. Experiments show that our end-to-end TAMNet network can reliably focus on the more important video frames in the video sequence, and effectively capture the action-relevant regional visual information in the spatial features of each frame of the video sequence. Inspired by the two-stream structure, we construct a two-modalities TAMNet network. In the same training conditions, the two-modalities TAMNet network achieved optimal performance on both datasets.

Highlights

Video action recognition has always been a research hotspot in the field of computer vision, with the goal of analyzing the action which is ongoing in an unknown video or image sequence
In order to address these challenges, in this paper, we have made the following contributions: (1) We propose a recurrent region attention cell, which can effectively capture the action-relevant regional visual information in the spatial features of video frames in order to reduce the interference of redundant information and noise information on action spatial features
The recurrent region attention cell in Recurrent Region Attention model (RRA) iterates according to the temporal sequence of the video, so that the RRA can effectively capture the action-relevant regional visual information in the spatial features of each frame of the action video sequence

Summary

INTRODUCTION

Video action recognition has always been a research hotspot in the field of computer vision, with the goal of analyzing the action which is ongoing in an unknown video or image sequence. (1) We propose a recurrent region attention cell, which can effectively capture the action-relevant regional visual information in the spatial features of video frames in order to reduce the interference of redundant information and noise information on action spatial features. The recurrent region attention cell in RRA iterates according to the temporal sequence of the video, so that the RRA can effectively capture the action-relevant regional visual information in the spatial features of each frame of the action video sequence. (3) We propose a Two-level Attention Model based video action recognition network (TAMNet) which can be end-to-end trained.

RELATED WORKS

VIDEO FRAME ATTENTION MODEL

VIDEO LEVEL PREDICTION

PARAMETERS UPDATE

EXPERIMENTS

DATASETS

Findings

CONCLUSION

Full Text

Published version (

Free)

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: IEEE access : practical innovations, open solutions	Publication Date: Jan 1, 2019
Citations: 35	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

Two-Level Attention Model Based Video Action Recognition Network

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: IEEE access : practical innovations, open solutions

Lead the way for us

Similar Papers

Recurrent Attention Model for Pedestrian Attribute Recognition
Xin Zhao ... Jungong Han
Proceedings of the ... AAAI Conference on Artificial Intelligence. AAAI Conference on Artificial Intelligence | VOL. 33
Xin Zhao, et. al.Xin Zhao ... Jungong Han
17 Jul 2019
Proceedings of the ... AAAI Conference on Artificial Intelligence. AAAI Conference on Artificial Intelligence | VOL. 33

Action Recognition in Haze Using an Efficient Fusion of Spatial and Temporal Features
Sri Girinadh Tanneru ... Snehasis Mukherjee
-
Sri Girinadh Tanneru, et. al.Sri Girinadh Tanneru ... Snehasis Mukherjee
01 Jan 2020
01 Jan 2020

Scalar fields features of video sequences energy characteristics
S.V. Vasilyev ... A.V. Bogoslovsky
Radioengineering | VOL. 2
S.V. Vasilyev, et. al.S.V. Vasilyev ... A.V. Bogoslovsky
01 Feb 2024
Radioengineering | VOL. 2

Tracking radiopaque markers using lower rank approximation
A.M.M Muijtjens ... T Arts
-
A.M.M Muijtjens, et. al.A.M.M Muijtjens ... T Arts
25 Sep 1994
25 Sep 1994

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Two-Level Attention Model Based Video Action Recognition Network

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: IEEE access : practical innovations, open solutions