Abstract

Graph convolutional networks (GCNs) have achieved outstanding performances on skeleton-based action recognition. However, several problems remain in existing GCN-based methods, and the spatial-temporal features are not discriminative enough. Temporal convolution with one fixed kernel cannot obtain more discriminative temporal features for different actions. Besides, only a single-scale feature is used for classification, which ignores the multilevel information. In this article, we propose a novel multi-scale and multi-stream improved graph convolutional network (MM-IGCN). In each spatial-temporal block of MM-IGCN, we employ an improved temporal convolution with multiple parallel kernels to enhance the temporal features. An improved GCN and an enhanced attention module are adopted in the block to strengthen spatial-temporal features. A multi-scale structure is first introduced in action recognition to obtain the multilevel information. The improved spatial-temporal blocks and multi-scale structure compose our single-stream model. Moreover, we adopt the bone cosine distance as a novel input feature. Five streams (joint, bone, their motions, and bone cosine distance) of features are fed into our single-stream model respectively, which compose our MM-IGCN. Experiments on two large datasets, NTU-RGB+D and NTU-RGB+D-120, illustrate that our single-stream model achieves state-of-the-art, and our MM-IGCN is far superior to other models.

Highlights

  • Human motion recognition has a wide range of applications in video surveillance, healthcare, smart home, smart driving, and human-computer interaction [1], [2]

  • To solve the problem mentioned above, we propose multi-scale and multi-stream improved graph convolutional network (MM-IGCN) in this work

  • In this work, we propose a multi-scale and multi-stream improved graph convolutional network with improved Graph convolutional networks (GCNs), enhanced attention module, and improved Temporal convolutional network (TCN)

Read more

Summary

INTRODUCTION

Human motion recognition has a wide range of applications in video surveillance, healthcare, smart home, smart driving, and human-computer interaction [1], [2]. Lei et al designed a two-stream adaptive graph convolution network (2s-AGCN) [14] based on ST-GCN, which introduced the non-local block [15] to learn the connections between joints adaptively. (2) These existing approaches based on GCN all focus on the spatial domain to capture much better correlations between joints They ignore the necessity of more complex temporal features to recognition. A. CNN-BASED APPROACHES CNN models have been used to learn spatial-temporal features from skeletons due to its excellent ability to extract high-level information. They proposed first-order graph convolutional neural networks [11], which contains aggregation functions that define node correlations This method is seen as a bridge between the spectral and spatial methods. We follow the spatial methods in action recognition tasks

METHODS
EXPERIMENTS
Findings
CONCLUSION
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.