MGSAN: multimodal graph self-attention network for skeleton-based action recognition

Junyi Wang,Ziao Li,Bangli Liu,Haibin Cai,Mohamad Saada,Qinggang Meng

doi:10.1007/s00530-024-01566-8

Abstract

Due to the emergence of graph convolutional networks (GCNs), the skeleton-based action recognition has achieved remarkable results. However, the current models for skeleton-based action analysis treat skeleton sequences as a series of graphs, aggregating features of the entire sequence by alternately extracting spatial and temporal features, i.e., using a 2D (spatial features) plus 1D (temporal features) approach for feature extraction. This undoubtedly overlooks the complex spatiotemporal fusion relationships between joints during motion, making it challenging for models to capture the connections between different temporal frames and joints. In this paper, we propose a Multimodal Graph Self-Attention Network (MGSAN), which combines GCNs with self-attention to model the spatiotemporal relationships between skeleton sequences. Firstly, we design graph self-attention (GSA) blocks to capture the intrinsic topology and long-term temporal dependencies between joints. Secondly, we propose a multi-scale spatio-temporal convolutional network for channel-wise topology modeling (CW-TCN) to model short-term smooth temporal information of joint movements. Finally, we propose a multimodal fusion strategy to fuse joint, joint movement, and bone flow, providing the model with a richer set of multimodal features to make better predictions. The proposed MGSAN achieves state-of-the-art performance on three large-scale skeleton-based action recognition datasets, with accuracy of 93.1% on NTU RGB+D 60 cross-subject benchmark, 90.3% on NTU RGB+D 120 cross-subject benchmark, and 97.0% on the NW-UCLA dataset. Code is available at https://github.com/lizaowo/MGSAN.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

MGSAN: multimodal graph self-attention network for skeleton-based action recognition

Abstract

Talk to us

Similar Papers

More From: Multimedia Systems

Lead the way for us

Journal: Multimedia Systems	Publication Date: Nov 27, 2024
License type: CC BY 4.0

Similar Papers

An Attention Enhanced Graph Convolutional LSTM Network for Skeleton-Based Action Recognition
Chenyang Si ... Wentao Chen
-
Chenyang Si, et. al.Chenyang Si ... Wentao Chen
01 Jun 2019
01 Jun 2019

A Spatial Attention-Enhanced Multi-Timescale Graph Convolutional Network for Skeleton-Based Action Recognition
Shuqiong Zhu ... Xiaolu Ding
-
Shuqiong Zhu, et. al.Shuqiong Zhu ... Xiaolu Ding
26 Jun 2020
26 Jun 2020

Action Recognition Based on the Fusion of Graph Convolutional Networks with High Order Features
Jiuqing Dong ... Hyo Jong Lee
Applied Sciences | VOL. 10
Jiuqing Dong, et. al.Jiuqing Dong ... Hyo Jong Lee
21 Feb 2020
Applied Sciences | VOL. 10

Graph convolutional network – Long short term memory neural network- multi layer perceptron- Gaussian progress regression model: A new deep learning model for predicting ozone concertation
Mohammad Ehteram ... Ahmed El-Shafie
Atmospheric Pollution Research | VOL. 14
Mohammad Ehteram, et. al.Mohammad Ehteram ... Ahmed El-Shafie
18 Apr 2023
Atmospheric Pollution Research | VOL. 14

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

MGSAN: multimodal graph self-attention network for skeleton-based action recognition

Abstract

Talk to us

Similar Papers

More From: Multimedia Systems