Multi‐scale skeleton simplification graph convolutional network for skeleton‐based action recognition

Fan Zhang,Ding Chongyang,Kai Liu,Liu Hongjin

doi:10.1049/cvi2.12300

Abstract

AbstractHuman action recognition based on graph convolutional networks (GCNs) is one of the hotspots in computer vision. However, previous methods generally rely on handcrafted graph, which limits the effectiveness of the model in characterising the connections between indirectly connected joints. The limitation leads to weakened connections when joints are separated by long distances. To address the above issue, the authors propose a skeleton simplification method which aims to reduce the number of joints and the distance between joints by merging adjacent joints into simplified joints. Group convolutional block is devised to extract the internal features of the simplified joints. Additionally, the authors enhance the method by introducing multi‐scale modelling, which maps inputs into sequences across various levels of simplification. Combining with spatial temporal graph convolution, a multi‐scale skeleton simplification GCN for skeleton‐based action recognition (M3S‐GCN) is proposed for fusing multi‐scale skeleton sequences and modelling the connections between joints. Finally, M3S‐GCN is evaluated on five benchmarks of NTU RGB+D 60 (C‐Sub, C‐View), NTU RGB+D 120 (X‐Sub, X‐Set) and NW‐UCLA datasets. Experimental results show that the authors’ M3S‐GCN achieves state‐of‐the‐art performance with the accuracies of 93.0%, 97.0% and 91.2% on C‐Sub, C‐View and X‐Set benchmarks, which validates the effectiveness of the method.

Full Text