Abstract

In skeleton-based human action recognition domain, the methods based on graph convolution networks have great success recently. However, most graphical neural networks consider the skeleton as a spatiotemporally uncorrelated graph and rely on a predetermined adjacency matrix, ignoring the spatiotemporal relevance of human actions and taking up significant computational costs. Meanwhile, the methods use graph convolution to focus too much on the neighboring nodes of the joints and ignore the totality of the action. In this work, we propose a lightweight but efficient neural network called NLB-ACSE based on the Graph Convolutional Network (GCN). Our model consists of two large branches: non-local block branch that focuses on the long distance features and adaptive cross-spacetime edge branch that focuses on the short distance features. Both branches extract information across time and space, and focus on long and short information. Some simple but effective strategies also are applied to our model, such as semantics, maxpooling, and fusion inputs, which have small parameter burden but obtain a higher accuracy on ablation study. The proposed method with an order of magnitude smaller size than most previous papers is evaluated on three large datasets, NTU60, NTU120, and Northwesten-UCLA. The experimental results show that our method achieves the state-of-the-art performance.

Highlights

  • I N recent years, human action recognition has many applications in the real world, such as human-computer interaction, robot technology, and health care systems [1]– [3]

  • Our work focuses on the task of skeleton-based action recognition

  • Some of the ways [12] to improve the ST-Graph Convolutional Network (GCN) approach to extract global features by performing graph convolution are obtained by extracting high-order polynomials of the adjacency matrix of the skeleton

Read more

Summary

INTRODUCTION

I N recent years, human action recognition has many applications in the real world, such as human-computer interaction, robot technology, and health care systems [1]– [3]. Some of the ways [12] to improve the ST-GCN approach to extract global features by performing graph convolution are obtained by extracting high-order polynomials of the adjacency matrix of the skeleton. The second problem lies in the spatial-only and temporalonly modules (Fig. 1) To this end, most existing approaches [10], [11] follow ST-GCN are to first use graph convolutions to extract spatial relationships at each time step, the temporal dynamics are modeled using 1D convolutional layers. Compared with using local graph convolution, non-local block is better at capturing longrange dependencies on space-time, achieving a global perceptual field for each joint. Rameters, our model on a smaller number of parameters achieves the state-of-the-art performance on three largescale datasets for skeleton-based action recognition

RELATED WORK
METHODS
C Concatenation
SEMANTICS AND MAXPOOLING LAYERS
EXPERIMENTS
COMPARISONS WITH STATE-OF-THE-ART METHODS
ABLATION STUDY
Method Baseline Less
Findings
CONCLUSION
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.