Abstract

Event recognition in surveillance video has gained extensive attention from the computer vision community. This process still faces enormous challenges due to the tiny inter-class variations that are caused by various facets, such as severe occlusion, cluttered backgrounds, and so forth. To address these issues, we propose a spatio-temporal deep residual network with hierarchical attentions (STDRN-HA) for video event recognition. In the first attention layer, the ResNet fully connected feature guides the Faster R-CNN feature to generate object-based attention (O-attention) for target objects. In the second attention layer, the O-attention further guides the ResNet convolutional feature to yield the holistic attention (H-attention) in order to perceive more details of the occluded objects and the global background. In the third attention layer, the attention maps use the deep features to obtain the attention-enhanced features. Then, the attention-enhanced features are input into a deep residual recurrent network, which is used to mine more event clues from videos. Furthermore, an optimized loss function named softmax-RC is designed, which embeds the residual block regularization and center loss to solve the vanishing gradient in a deep network and enlarge the distance between inter-classes. We also build a temporal branch to exploit the long- and short-term motion information. The final results are obtained by fusing the outputs of the spatial and temporal streams. Experiments on the four realistic video datasets, CCV, VIRAT 1.0, VIRAT 2.0, and HMDB51, demonstrate that the proposed method has good performance and achieves state-of-the-art results.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.