Abstract

Traditional convolution neural networks have achieved great success in human action recognition. However, it is challenging to establish effective associations between different human bone nodes to capture detailed information. In this paper, we propose a dual attention-guided multiscale dynamic aggregate graph convolution neural network (DAG-GCN) for skeleton-based human action recognition. Our goal is to explore the best correlation and determine high-level semantic features. First, a multiscale dynamic aggregate GCN module is used to capture important semantic information and to establish dependence relationships for different bone nodes. Second, the higher level semantic feature is further refined, and the semantic relevance is emphasized through a dual attention guidance module. In addition, we exploit the relationship of joints hierarchically and the spatial temporal correlations through two modules. Experiments with the DAG-GCN method result in good performance on the NTU-60-RGB+D and NTU-120-RGB+D datasets. The accuracy is 95.76% and 90.01%, respectively, for the cross (X)-View and X-Subon the NTU60dataset.

Highlights

  • Human action recognition is widely used in many scenarios, such as human-computer interaction [1], video retrieval [2], and medical treatment security [3]

  • dynamic aggregate graph convolutional network (DAG-graph convolutional network (GCN)) framework for skeleton-based human action recognition, and we provide the experimental results and analysis

  • We present the results of ablation experiments on multiscale dynamic aggregate operations and show the efficiency of the DAG-GCN recognition framework

Read more

Summary

Introduction

Human action recognition is widely used in many scenarios, such as human-computer interaction [1], video retrieval [2], and medical treatment security [3]. With the development of deep learning technology, human skeleton action recognition based on joint type, frame index, and 3D position identification has been widely studied. Compared with RGB human action video, skeleton data are more robust and computationally efficient. To improve the recognition accuracy of skeleton movements, researchers need to use deep learning technology to simulate the spatial-temporal nature of bone sequences [5,6]. RNN/LSTM uses short-term and long-term timing sequence dynamics to model the bone sequence, while CNN adjusts the bone data to the appropriate input (224 × 224) and learns the correlation

Objectives
Results
Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.