Residual attention unit for action recognition

Zhongke Liao,Haifeng Hu,Junxuan Zhang,Chang Yin

doi:10.1016/j.cviu.2019.102821

Abstract

3D CNNs are powerful tools for action recognition that can intuitively extract spatio-temporal features from raw videos. However, most of the existing 3D CNNs have not fully considered the disadvantageous effects of the background motion that frequently appears in videos. The background motion is usually misclassified as a part of human action, which may undermine modeling the dynamic pattern of the action. In this paper, we propose the residual attention unit (RAU) to address this problem. RAU aims to suppress the background motion by upweighting the values associated with the foreground region in the feature maps. Specifically, RAU contains two separate submodules in parallel, i.e., spatial attention as well as channel-wise attention. Given an intermediate feature map, the spatial attention works in a bottom-up top-down manner to generate the attention mask, while the channel-wise attention recalibrates the feature responses of all channels automatically. As applying the attention mechanism directly to the input features may lead to the loss of discriminative information, we design a bypass to preserve the integrity of the original features by a shortcut connection between the input and output of the attention module. Notably, our RAU can be embedded into 3D CNNs easily and enables end-to-end training along with the networks. The experimental results on UCF101 and HMDB51 demonstrate the validity of our RAU.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Residual attention unit for action recognition

Abstract

Talk to us

Similar Papers

More From: Computer Vision and Image Understanding

Lead the way for us

Journal: Computer Vision and Image Understanding	Publication Date: Sep 20, 2019
Citations: 10

Similar Papers

SCA-CNN: Spatial and Channel-Wise Attention in Convolutional Networks for Image Captioning
Long Chen ... Hanwang Zhang
-
Long Chen, et. al.Long Chen ... Hanwang Zhang
01 Jul 2017
01 Jul 2017

Classification of lung nodules in CT scans using three-dimensional deep convolutional neural networks with a checkpoint ensemble method
Hwejin Jung ... Jaewoo Kang
BMC Medical Imaging | VOL. 18
Hwejin Jung, et. al.Hwejin Jung ... Jaewoo Kang
01 Dec 2018
BMC Medical Imaging | VOL. 18

Efficient Spatio-Temporal Modeling Methods for Real-Time Violence Recognition
Min-Seok Kang ... Hyung-Min Park
IEEE Access | VOL. 9
Min-Seok Kang, et. al.Min-Seok Kang ... Hyung-Min Park
01 Jan 2020
IEEE Access | VOL. 9

Fine-Grained Age Estimation With Multi-Attention Network
Chunlong Hu ... Junbin Gao
IEEE Access | VOL. 8
Chunlong Hu, et. al.Chunlong Hu ... Junbin Gao
01 Jan 2020
IEEE Access | VOL. 8

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Residual attention unit for action recognition

Abstract

Talk to us

Similar Papers

More From: Computer Vision and Image Understanding