Video Action Recognition Using Motion and Multi-View Excitation with Temporal Aggregation.

Yuri Yudhaswana Joefrie,Masaki Aono

doi:10.3390/e24111663

Yuri Yudhaswana Joefrie, Masaki Aono

Open Access

https://doi.org/10.3390/e24111663

Copy DOI

Abstract

Spatiotemporal and motion feature representations are the key to video action recognition. Typical previous approaches are to utilize 3D CNNs to cope with both spatial and temporal features, but they suffer from huge computations. Other approaches are to utilize (1+2)D CNNs to learn spatial and temporal features in an efficient way, but they neglect the importance of motion representations. To overcome problems with previous approaches, we propose a novel block which makes it possible to alleviate the aforementioned problems, since our block can capture spatial and temporal features more faithfully and efficiently learn motion features. This proposed block includes Motion Excitation (ME), Multi-view Excitation (MvE), and Densely Connected Temporal Aggregation (DCTA). The purpose of ME is to encode feature-level frame differences; MvE is designed to enrich spatiotemporal features with multiple view representations adaptively; and DCTA is to model long-range temporal dependencies. We inject the proposed building block, which we refer to as the META block (or simply "META"), into 2D ResNet-50. Through extensive experiments, we demonstrate that our proposed method architecture outperforms previous CNN-based methods in terms of "Val Top-1 %" measure with Something-Something v1 and Jester datasets, while the META yielded competitive results with the Moment-in-Time Mini dataset.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Entropy (Basel, Switzerland)	Publication Date: Nov 15, 2022
Citations: 2	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

Video Action Recognition Using Motion and Multi-View Excitation with Temporal Aggregation.

Abstract

Talk to us

Similar Papers

More From: Entropy (Basel, Switzerland)

Lead the way for us

Similar Papers

TEA: Temporal Excitation and Aggregation for Action Recognition
Yan Li ... Limin Wang
-
Yan Li, et. al.Yan Li ... Limin Wang
01 Jun 2020
01 Jun 2020

Spatiotemporal feature learning for no-reference gaming content video quality assessment
Ngai-Wing Kwong ... Kin-Man Lam
Journal of Visual Communication and Image Representation | VOL. 100
Ngai-Wing Kwong, et. al.Ngai-Wing Kwong ... Kin-Man Lam
13 Mar 2024
Journal of Visual Communication and Image Representation | VOL. 100

Action Recognition in Haze Using an Efficient Fusion of Spatial and Temporal Features
Sri Girinadh Tanneru ... Snehasis Mukherjee
-
Sri Girinadh Tanneru, et. al.Sri Girinadh Tanneru ... Snehasis Mukherjee
01 Jan 2020
01 Jan 2020

Collaborative Spatiotemporal Feature Learning for Video Action Recognition
Chao Li ... Qiaoyong Zhong
-
Chao Li, et. al.Chao Li ... Qiaoyong Zhong
01 Jun 2019
01 Jun 2019

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Video Action Recognition Using Motion and Multi-View Excitation with Temporal Aggregation.

Abstract

Talk to us

Similar Papers

More From: Entropy (Basel, Switzerland)