Abstract

Skeleton-based human action recognition conveys interesting information about the dynamics of a human body. In this work, we develop a method that uses a multi-stream model with connections between the parallel streams. This work is inspired by a state-of-the-art method called FUSIONCPA that merges different modalities: infrared input and skeleton input. Because we are interested in investigating improvements related to the skeleton-branch backbone, we used the Spatial-Temporal Graph Convolutional Networks (ST-GCN) model and an EfficientGCN attention module. We aim to provide improvements when capturing spatial and temporal features. In addition, we exploited a Graph Convolutional Network (GCN) implemented in the ST-GCN model to capture the graphic connectivity in skeletons. This paper reports interesting accuracy on a large-scale dataset (NTU-RGB+D 60), over 91% and 93% on respectively crosssubject, and cross-view benchmarks. This proposed model is lighter by 9 million training parameters compared with the model FUSION-CPA.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call