CSST-Net: Channel Split Spatiotemporal Network for Human Action Recognition

Xuan Zhou,Jianping Yi,Jixiang Ma

doi:10.5755/j01.itc.52.4.33239

Abstract

Temporal reasoning is crucial for action recognition tasks. The previous works use 3D CNNs to jointly capture spatiotemporal information, but it causes a lot of computational costs as well. To improve the above problems, we propose a general channel split spatiotemporal network (CSST-Net) to achieve effective spatiotemporal feature representation learning. The CSST module consists of the grouped spatiotemporal modeling (GSTM) module and the parameter-free feature fusion (PFFF) module. The GSTM module decomposes features into spatial and temporal parts along the channel dimension in parallel, which focuses on spatial and temporal clues respectively. Meanwhile, we utilize the combination of group-wise convolution and point-wise convolution to reduce the number of parameters in the temporal branch, thus alleviating the overfitting of 3D CNNs. Furthermore, for the problem of spatiotemporal feature fusion, the PFFF module performs the recalibration and fusion of spatial and temporal features by a soft attention mechanism, without introducing extra parameters, thus ensuring the correct network information flow effectively. Finally, extensive experiments on three benchmark databases (Sth-Sth V1, Sth-Sth V2, and Jester) indicate that the proposed CSST-Net can achieve competitive performance compared to existing methods, and significantly reduces the number of parameters and FLOPs of 3D CNNs baseline.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Information Technology and Control	Publication Date: Dec 22, 2023
Citations: 1	License type: cc-by

R Discovery Prime

R Discovery Prime

CSST-Net: Channel Split Spatiotemporal Network for Human Action Recognition

Abstract

Talk to us

Similar Papers

More From: Information Technology and Control

Lead the way for us

Similar Papers

Hand Gesture Recognition from RGB-D Data using 2D and 3D Convolutional Neural Networks: a comparative study
...
Journal of AI and Data Mining | VOL. 8
, et. al. ...
01 Apr 2020
Journal of AI and Data Mining | VOL. 8

Feature fusion and context interaction for RGB-D indoor semantic segmentation
Heng Liu ... Shaoxun Wang
Applied Soft Computing | VOL. 167
Heng Liu, et. al.Heng Liu ... Shaoxun Wang
29 Oct 2024
Applied Soft Computing | VOL. 167

Video‐based action recognition using spurious‐3D residual attention networks
Bo Chen ... Baoqing Li
IET Image Processing | VOL. 16
Bo Chen, et. al.Bo Chen ... Baoqing Li
26 May 2022
IET Image Processing | VOL. 16

A multiscale feature fusion network based on attention mechanism for motor imagery EEG decoding
Dongrui Gao ... Yongqing Zhang
Applied Soft Computing | VOL. 151
Dongrui Gao, et. al.Dongrui Gao ... Yongqing Zhang
05 Dec 2023
Applied Soft Computing | VOL. 151

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

CSST-Net: Channel Split Spatiotemporal Network for Human Action Recognition

Abstract

Talk to us

Similar Papers

More From: Information Technology and Control