Separable ConvNet Spatiotemporal Mixer for Action Recognition

Hsu-Yung Cheng,Chenyu Li,Chih-Chang Yu

doi:10.3390/electronics13030496

Hsu-Yung Cheng, Chenyu Li + Show 1 more

Open Access

https://doi.org/10.3390/electronics13030496

Copy DOI

Abstract

Video action recognition is vital in the research area of computer vision. In this paper, we develop a novel model, named Separable ConvNet Spatiotemporal Mixer (SCSM). Our goal is to develop an efficient and lightweight action recognition backbone that can be applied to multi-task models to increase the accuracy and processing speed. The SCSM model uses a new hierarchical spatial compression, employing the spatiotemporal fusion method, consisting of a spatial domain and a temporal domain. The SCSM model maintains the independence of each frame in the spatial domain for feature extraction and fuses the spatiotemporal features in the temporal domain. The architecture can be adapted to different frame rate requirements due to its high scalability. It is suitable to serve as a backbone for multi-task video feature extraction or industrial applications with its low prediction and training costs. According to the experimental results, SCSM has a low number of parameters and low computational complexity, making it highly scalable with strong transfer learning capabilities. The model achieves video action recognition accuracy comparable to state-of-the-art models with a smaller parameter size and fewer computational requirements.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Separable ConvNet Spatiotemporal Mixer for Action Recognition

Abstract

Talk to us

Similar Papers

More From: Electronics

Lead the way for us

Journal: Electronics	Publication Date: Jan 24, 2024
License type: CC BY 4.0

Similar Papers

Attention‐based spatial–temporal hierarchical ConvLSTM network for action recognition in videos
Fei Xue ... Hongbing Ji
IET Computer Vision | VOL. 13
Fei Xue, et. al.Fei Xue ... Hongbing Ji
28 Nov 2019
IET Computer Vision | VOL. 13

Understanding action recognition in still images
Deeptha Girish ... Anca Ralescu
-
Deeptha Girish, et. al.Deeptha Girish ... Anca Ralescu
01 Jun 2020
01 Jun 2020

Audio and Video Feature Fusion for Activity Recognition in Unconstrained Videos
José Lopes ... Sameer Singh
-
José Lopes, et. al.José Lopes ... Sameer Singh
01 Jan 2006
01 Jan 2006

Generalized zero-shot learning for action recognition with web-scale video data
Kun Liu ... Xiongxiong Dong
World Wide Web | VOL. 22
Kun Liu, et. al.Kun Liu ... Xiongxiong Dong
09 Nov 2018
World Wide Web | VOL. 22

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Separable ConvNet Spatiotemporal Mixer for Action Recognition

Abstract

Talk to us

Similar Papers

More From: Electronics