Abstract

There has been significant progress in skeleton-based action recognition. Human skeleton can be naturally structured into graph, so graph convolution networks have become the most popular method in this task. Most of these state-of-the-art methods optimized the structure of human skeleton graph to obtain better performance. Based on these advanced algorithms, a simple but strong network is proposed with three major contributions. Firstly, inspired by some adaptive graph convolution networks and non-local blocks, some kinds of self-attention modules are designed to exploit spatial and temporal dependencies and dynamically optimize the graph structure. Secondly, a light but efficient architecture of network is designed for skeleton-based action recognition. Moreover, a trick is proposed to enrich the skeleton data with bones connection information and make obvious improvement to the performance. The method achieves 90.5% accuracy on cross-subjects setting (NTU60), with 0.89M parameters and 0.32 GMACs of computation cost. This work is expected to inspire new ideas for the field.

Highlights

  • There has been significant progress in skeleton-based action recognition

  • Human action recognition is an important task that can be used in video analysis, human-computer interaction and so o­ n1–3

  • A trick is used which plays an important role in achieving better performances

Read more

Summary

Methods

ST-GCN14 AS-GCN18 2s-AGCN7 DGNN32 MS-AAGCN12 MS-G3D11 MST (2s)[33] Double-head (joint)[34] Double-head (2s)[34] Ours. The proposed network is very lightweight with 0.89M parameters and 0.32GMACs of computation cost. Most previous methods are based on ST-GCN14,37 and every sequence contains 150 frames. In the proposed method, with 20 frames, fewer CNN layers are enough to model the time. The motion of every joints and bones is computed which contains some information about time. This allows us to model time with ease. The proposed method based on self-attention mechanism could exploiting the long-range dependencies better with fewer stacked layers. The proposed network is too lightweight to model such complex data, and do not achieve very impressive performance on these two datasets

Findings
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call