Motion-Driven Spatial and Temporal Adaptive High-Resolution Graph Convolutional Networks for Skeleton-Based Action Recognition

Zengxi Huang,Zhenhua Feng,Xiaobing Lin,Yusong Qin,Yiguang Liu,Tianlin Liu

doi:10.1109/tcsvt.2022.3217763

Abstract

Graph convolutional networks (GCN) have attracted increasing interest in action recognition in recent years. GCN models human skeleton sequences as spatio-temporal graphs. Also, attention mechanisms are often jointly used with GCNs to highlight important frames or body joints in a sequence. However, attention modules learn parameters offline and are fixed, so may not adapt well to unseen samples. In this paper, we propose a simple but effective motion-driven spatial and temporal adaptation strategy to dynamically strengthen the features of important frames and joints for skeleton-based action recognition. The rationale is that the joints and frames with dramatic motions are generally more informative and discriminative. We combine the spatial and temporal refinements by using a two-branch structure, in which the joint and frame-wise feature refinements perform in parallel. Such a structure can lead to learn more complementary feature representations. Moreover, we propose to use the fully connected graph convolution to learn the long-range spatial dependencies. Besides, we investigate two high-resolution skeleton graphs by creating virtual joints, aiming to improve the representation of skeleton features. By combining the above proposals, we develop a novel motion-driven spatial and temporal adaptive high-resolution GCN. Experimental results demonstrate that the proposed model achieves state-of-the-art (SOTA) results on the challenging large-scale Kinetics-Skeleton and UAV-Human datasets, and it is on par with the SOTA methods on the two NTU-RGB+D 60&120 datasets. Additionally, our motion-driven adaptation method shows encouraging performance when compared with the attention mechanisms.

Full Text