An efficient self-attention network for skeleton-based action recognition

Xiaofei Qin,Xuedian Zhang,Rui Cai,Jiabin Yu,Changxiang He

doi:10.1038/s41598-022-08157-5

Abstract

There has been significant progress in skeleton-based action recognition. Human skeleton can be naturally structured into graph, so graph convolution networks have become the most popular method in this task. Most of these state-of-the-art methods optimized the structure of human skeleton graph to obtain better performance. Based on these advanced algorithms, a simple but strong network is proposed with three major contributions. Firstly, inspired by some adaptive graph convolution networks and non-local blocks, some kinds of self-attention modules are designed to exploit spatial and temporal dependencies and dynamically optimize the graph structure. Secondly, a light but efficient architecture of network is designed for skeleton-based action recognition. Moreover, a trick is proposed to enrich the skeleton data with bones connection information and make obvious improvement to the performance. The method achieves 90.5% accuracy on cross-subjects setting (NTU60), with 0.89M parameters and 0.32 GMACs of computation cost. This work is expected to inspire new ideas for the field.

Highlights

There has been significant progress in skeleton-based action recognition
Human action recognition is an important task that can be used in video analysis, human-computer interaction and so o n1–3
A trick is used which plays an important role in achieving better performances

Summary

Methods

ST-GCN14 AS-GCN18 2s-AGCN7 DGNN32 MS-AAGCN12 MS-G3D11 MST (2s)[33] Double-head (joint)[34] Double-head (2s)[34] Ours. The proposed network is very lightweight with 0.89M parameters and 0.32GMACs of computation cost. Most previous methods are based on ST-GCN14,37 and every sequence contains 150 frames. In the proposed method, with 20 frames, fewer CNN layers are enough to model the time. The motion of every joints and bones is computed which contains some information about time. This allows us to model time with ease. The proposed method based on self-attention mechanism could exploiting the long-range dependencies better with fewer stacked layers. The proposed network is too lightweight to model such complex data, and do not achieve very impressive performance on these two datasets

Findings

Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Scientific reports	Publication Date: Mar 8, 2022
Citations: 16	License type: open-access

R Discovery Prime

R Discovery Prime

An efficient self-attention network for skeleton-based action recognition

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Scientific reports

Lead the way for us

Similar Papers

Learning graph structure via graph convolutional networks
Qi Zhang ... Chunhong Pan
Pattern Recognition | VOL. 95
Qi Zhang, et. al.Qi Zhang ... Chunhong Pan
02 Jul 2019
Pattern Recognition | VOL. 95

Whole and Part Adaptive Fusion Graph Convolutional Networks for Skeleton-Based Action Recognition.
Qi Zuo ... Dongqian Li
Sensors | VOL. 20
Qi Zuo, et. al.Qi Zuo ... Dongqian Li
13 Dec 2020
Sensors | VOL. 20

Predictively encoded graph convolutional network for noise-robust skeleton-based action recognition
Yongsang Yoon ... Jongmin Yu
Applied Intelligence | VOL. 52
Yongsang Yoon, et. al.Yongsang Yoon ... Jongmin Yu
09 Jun 2021
Applied Intelligence | VOL. 52

Skeleton-Based Action Recognition with Multi-Stream Adaptive Graph Convolutional Networks.
Lei Shi ... Hanqing Lu
IEEE Transactions on Image Processing | VOL. PP
Lei Shi, et. al.Lei Shi ... Hanqing Lu
01 Jan 2020
IEEE Transactions on Image Processing | VOL. PP

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

An efficient self-attention network for skeleton-based action recognition

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Scientific reports