Integrating Temporal and Spatial Attention for Video Action Recognition

Yuanding Zhou,Baopu Li,Haojie Li,Zhihui Wang

doi:10.1155/2022/5094801

Yuanding Zhou, Baopu Li + Show 2 more

Open Access

https://doi.org/10.1155/2022/5094801

Copy DOI

Abstract

In recent years, deep convolutional neural networks (DCNN) have been widely used in the field of video action recognition. Attention mechanisms are also increasingly utilized in action recognition tasks. In this paper, we want to combine temporal and spatial attention for better video action recognition. Specifically, we learn a set of sparse attention by computing class response maps for finding the most informative region in a video frame. Each video frame is resampled with this information to form two new frames, one focusing on the most discriminative regions of the image and the other on the complementary regions of the image. After computing sparse attention all the newly generated video frames are rearranged in the order of the original video to form two new videos. These two videos are then fed into a CNN as new inputs to reinforce the learning of discriminative regions in the images (spatial attention). And the CNN we used is a network with a frame selection strategy that allows the network to focus on only some of the frames to complete the classification task (temporal attention). Finally, we combine the three video (original, discriminative, and complementary) classification results to get the final result together. Our experiments on the datasets UCF101 and HMDB51 show that our approach outperforms the best available methods.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Security and Communication Networks	Publication Date: Apr 26, 2022
Citations: 1	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

Integrating Temporal and Spatial Attention for Video Action Recognition

Abstract

Talk to us

Similar Papers

More From: Security and Communication Networks

Lead the way for us

Similar Papers

Two-Level Attention Module Based on Spurious-3D Residual Networks for Human Action Recognition.
Bo Chen ... Fangzhou Meng
Sensors (Basel, Switzerland) | VOL. 23
Bo Chen, et. al.Bo Chen ... Fangzhou Meng
03 Feb 2023
Sensors (Basel, Switzerland) | VOL. 23

Action Recognition in Videos with Temporal Segments Fusions
Yuanye Fang ... Qiu-Feng Wang
-
Yuanye Fang, et. al.Yuanye Fang ... Qiu-Feng Wang
01 Jan 2020
01 Jan 2020

A Study on the use of State-of-the-Art CNNs with Fine Tuning for Spatial Stream Generation for Activity Recognition
Mercy Ranjit ... Gopinath Ganapathy
-
Mercy Ranjit, et. al.Mercy Ranjit ... Gopinath Ganapathy
01 Feb 2019
01 Feb 2019

Multipath Attention and Adaptive Gating Network for Video Action Recognition
Haiping Zhang ... Conghao Ma
Neural Processing Letters | VOL. 56
Haiping Zhang, et. al.Haiping Zhang ... Conghao Ma
27 Mar 2024
Neural Processing Letters | VOL. 56

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Integrating Temporal and Spatial Attention for Video Action Recognition

Abstract

Talk to us

Similar Papers

More From: Security and Communication Networks