A Spatial Attention-Enhanced Multi-Timescale Graph Convolutional Network for Skeleton-Based Action Recognition

Shuqiong Zhu,Wai Chen,Kai Yang,Xiaolu Ding

doi:10.1145/3430199.3430213

Abstract

How to effectively extract discriminative spatial and temporal features is important for skeleton-based action recognition. However, current researches on skeleton-based action recognition mainly focus on the natural connections of the skeleton and original temporal sequences of the skeleton frames, which ignore the inter-related relation of non-adjacent joints and the variant velocities of action instances. To overcome these limitations and therefore enhance the spatial and temporal features extraction for action recognition, we propose a novel Spatial Attention-Enhanced Multi-Timescale Graph Convolutional Network (SA-MTGCN) for skeleton-based action recognition. Specifically, as the relation of non-adjacent but inter-related joints is beneficial for action recognition, we propose an Attention-Enhanced Spatial Graph Convolutional Network (A-SGCN) to use both natural connection and inter-related relation of joints. Furthermore, a Multi-Timescale (MT) structure is proposed to enhance temporal feature extraction by gathering different network layers to model different velocities of action instances. Experimental results on the two widely used NTU and Kinetics datasets demonstrate the effectiveness of our approach.

Full Text