Two-Stream Collaborative Learning With Spatial-Temporal Attention for Video Classification

Yuxin Peng,Junchao Zhang,Yunzhen Zhao

doi:10.1109/tcsvt.2018.2808685

Abstract

Video classification is highly important and has widespread applications, such as video search and intelligent surveillance. Video naturally contains both static and motion information, which can be represented by frames and optical flow, respectively. Recently, researchers have generally adopted deep networks to capture the static and motion information separately , which has two main limitations. First, the coexistence relationship between spatial and temporal attention is ignored, although they should be jointly modeled as the spatial and temporal evolutions of video to learn discriminative video features. Second, the strong complementarity between static and motion information is ignored, although they should be collaboratively learned to enhance each other. To address the above two limitations, this paper proposes the two-stream collaborative learning with spatial-temporal attention (TCLSTA) approach, which consists of two models. First, for the spatial-temporal attention model , the spatial-level attention emphasizes the salient regions in a frame, and the temporal-level attention exploits the discriminative frames in a video. They are mutually enhanced to jointly learn the discriminative static and motion features for better classification performance. Second, for the static-motion collaborative model , it not only achieves mutual guidance between static and motion information to enhance the feature learning but also adaptively learns the fusion weights of static and motion streams, thus exploiting the strong complementarity between static and motion information to improve video classification. Experiments on four widely used data sets show that our TCLSTA approach achieves the best performance compared with more than 10 state-of-the-art methods.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Two-Stream Collaborative Learning With Spatial-Temporal Attention for Video Classification

Abstract

Talk to us

Similar Papers

More From: IEEE Transactions on Circuits and Systems for Video Technology

Lead the way for us

Journal: IEEE Transactions on Circuits and Systems for Video Technology	Publication Date: Mar 1, 2019
Citations: 164

Similar Papers

Attention systems and neural responses to visual and auditory stimuli: an fMRI study
Chunlin Li ... Dehua Chui
-
Chunlin Li, et. al.Chunlin Li ... Dehua Chui
01 May 2007
01 May 2007

The Role of Motion Information in Learning Human-Robot Joint Attention
Y Nagai
-
Y NagaiY Nagai
18 Apr 2005
18 Apr 2005

Spatial-temporal Transformers for EEG Emotion Recognition
Jiyao Liu ... Li Zhang
-
Jiyao Liu, et. al.Jiyao Liu ... Li Zhang
21 Oct 2022
21 Oct 2022

Action Video Games Make Dyslexic Children Read Better
Sandro Franceschini ... Andrea Facoetti
Current Biology | VOL. 23
Sandro Franceschini, et. al.Sandro Franceschini ... Andrea Facoetti
28 Feb 2013
Current Biology | VOL. 23

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Two-Stream Collaborative Learning With Spatial-Temporal Attention for Video Classification

Abstract

Talk to us

Similar Papers

More From: IEEE Transactions on Circuits and Systems for Video Technology