Skeleton-Based Action Recognition With Focusing-Diffusion Graph Convolutional Networks

Jialin Gao,Shiming Ge,Tong He,Xi Zhou

doi:10.1109/lsp.2021.3116513

Jialin Gao, Shiming Ge + Show 2 more

Open Access

https://doi.org/10.1109/lsp.2021.3116513

Copy DOI

Journal: IEEE Signal Processing Letters	Publication Date: Jan 1, 2021
Citations: 15	License type: publisher-specific, author manuscript

Abstract

Graph Convolutional Networks have been successfully applied in skeleton-based action recognition. The key is fully exploring the spatial-temporal context. This letter proposes a Focusing-Diffusion Graph Convolutional Network (FDGCN) to address this issue. Each skeleton frame is first decomposed into two opposite-direction graphs for subsequent focusing and diffusion processes. Next, the focusing process generates a spatial-level representation for each frame individually by an attention module. This representation is regarded as a supernode to aggregate the feature from each joint node in each frame for spatial context extraction. After generating supernodes for the entire sequence, a transformer encoder layer is proposed to capture the temporal context further. Finally, these supernodes pass the embedded spatial-temporal context back to the spatial joints through the diffusion graph in the diffusing process. Extensive experiments on the NTU RGB+D and Skeleton-Kinetics benchmarks demonstrate the effectiveness of our approach.

Full Text