Semantic Adversarial Network with Multi-Scale Pyramid Attention for Video Classification

De Xie,Cheng Deng,Dapeng Tao,Chao Li,Hao Wang

doi:10.1609/aaai.v33i01.33019030

Abstract

Two-stream architecture have shown strong performance in video classification task. The key idea is to learn spatiotemporal features by fusing convolutional networks spatially and temporally. However, there are some problems within such architecture. First, it relies on optical flow to model temporal information, which are often expensive to compute and store. Second, it has limited ability to capture details and local context information for video data. Third, it lacks explicit semantic guidance that greatly decrease the classification performance. In this paper, we proposed a new two-stream based deep framework for video classification to discover spatial and temporal information only from RGB frames, moreover, the multi-scale pyramid attention (MPA) layer and the semantic adversarial learning (SAL) module is introduced and integrated in our framework. The MPA enables the network capturing global and local feature to generate a comprehensive representation for video, and the SAL can make this representation gradually approximate to the real video semantics in an adversarial manner. Experimental results on two public benchmarks demonstrate our proposed methods achieves state-of-the-art results on standard video datasets.

Full Text

Published version (

Free)

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Semantic Adversarial Network with Multi-Scale Pyramid Attention for Video Classification

Abstract

Talk to us

Similar Papers

More From: Proceedings of the ... AAAI Conference on Artificial Intelligence. AAAI Conference on Artificial Intelligence

Lead the way for us

Journal: Proceedings of the ... AAAI Conference on Artificial Intelligence. AAAI Conference on Artificial Intelligence	Publication Date: Jul 17, 2019
Citations: 14

Similar Papers

Tensor distance based multilinear globality preserving embedding: A unified tensor based dimensionality reduction framework for image and video classification
Yang Liu ... Keith C.C Chan
Expert systems with applications | VOL. 39
Yang Liu, et. al.Yang Liu ... Keith C.C Chan
15 Mar 2012
Expert systems with applications | VOL. 39

Short Video Representation Learning Based on Convolution Network with Text Attention Mechanism
Botao Zhu ... Keying Yang
-
Botao Zhu, et. al.Botao Zhu ... Keying Yang
01 Dec 2020
01 Dec 2020

On the Use of Deep Learning for Video Classification
Atiq ur Rehman ... Md Alamgir Kabir
Applied sciences | VOL. 13
Atiq ur Rehman, et. al.Atiq ur Rehman ... Md Alamgir Kabir
03 Feb 2023
Applied sciences | VOL. 13

Multi-Modal Low-Data-Based Learning for Video Classification
Erol Citak ... Mine Elif Karsligil
Applied sciences | VOL. 14
Erol Citak, et. al.Erol Citak ... Mine Elif Karsligil
17 May 2024
Applied sciences | VOL. 14

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Semantic Adversarial Network with Multi-Scale Pyramid Attention for Video Classification

Abstract

Talk to us

Similar Papers

More From: Proceedings of the ... AAAI Conference on Artificial Intelligence. AAAI Conference on Artificial Intelligence