Spatio-Temporal Deep Residual Network with Hierarchical Attentions for Video Event Recognition

Yonggang Li,Haibao Xu,Chunping Liu,Yi Ji,Shengrong Gong

doi:10.1145/3378026

Abstract

Event recognition in surveillance video has gained extensive attention from the computer vision community. This process still faces enormous challenges due to the tiny inter-class variations that are caused by various facets, such as severe occlusion, cluttered backgrounds, and so forth. To address these issues, we propose a spatio-temporal deep residual network with hierarchical attentions (STDRN-HA) for video event recognition. In the first attention layer, the ResNet fully connected feature guides the Faster R-CNN feature to generate object-based attention (O-attention) for target objects. In the second attention layer, the O-attention further guides the ResNet convolutional feature to yield the holistic attention (H-attention) in order to perceive more details of the occluded objects and the global background. In the third attention layer, the attention maps use the deep features to obtain the attention-enhanced features. Then, the attention-enhanced features are input into a deep residual recurrent network, which is used to mine more event clues from videos. Furthermore, an optimized loss function named softmax-RC is designed, which embeds the residual block regularization and center loss to solve the vanishing gradient in a deep network and enlarge the distance between inter-classes. We also build a temporal branch to exploit the long- and short-term motion information. The final results are obtained by fusing the outputs of the spatial and temporal streams. Experiments on the four realistic video datasets, CCV, VIRAT 1.0, VIRAT 2.0, and HMDB51, demonstrate that the proposed method has good performance and achieves state-of-the-art results.

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Spatio-Temporal Deep Residual Network with Hierarchical Attentions for Video Event Recognition

Abstract

Talk to us

Similar Papers

More From: ACM Transactions on Multimedia Computing, Communications, and Applications

Lead the way for us

Journal: ACM Transactions on Multimedia Computing, Communications, and Applications	Publication Date: Apr 30, 2020
Citations: 10

Similar Papers

Vehicle logo recognition based on depth residual shrinkage network
Zhaobo Lin
-
Zhaobo LinZhaobo Lin
29 Apr 2023
29 Apr 2023

Key-Frame based Event Recognition in Unconstrained Videos using Temporal Features
Prithwish Jana ... Partha Pratim Mohanta
-
Prithwish Jana, et. al.Prithwish Jana ... Partha Pratim Mohanta
01 Jun 2019
01 Jun 2019

Research on image classification algorithm based on depth residuals shrinkage network in Commercial Image Library
Jiantao Zhao ... Wenxin Chen
Journal of Physics: Conference Series | VOL. 2010
Jiantao Zhao, et. al.Jiantao Zhao ... Wenxin Chen
01 Sep 2021
Journal of Physics: Conference Series | VOL. 2010

Event Recognition in Videos by Learning from Heterogeneous Web Sources
Lin Chen ... Dong Xu
-
Lin Chen, et. al.Lin Chen ... Dong Xu
01 Jun 2013
01 Jun 2013

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Spatio-Temporal Deep Residual Network with Hierarchical Attentions for Video Event Recognition

Abstract

Talk to us

Similar Papers

More From: ACM Transactions on Multimedia Computing, Communications, and Applications