Spatial Residual Layer and Dense Connection Block Enhanced Spatial Temporal Graph Convolutional Network for Skeleton-Based Action Recognition

Cong Wu,Xiao-Jun Wu,Josef Kittler

doi:10.1109/iccvw.2019.00216

Cong Wu, Xiao-Jun Wu + Show 1 more

PDF Available

https://doi.org/10.1109/iccvw.2019.00216

Copy DOI

Export

Save

Cite

Publication Date: Oct 1, 2019

Citations: 47

Affiliation: Jiangnan University, University of Surrey

Abstract
Full-Text PDF
Similar Papers

Abstract

Listen

Recent research has shown that modeling the dynamic joint features of the human body by a graph convolutional network (GCN) is a groundbreaking approach for skeleton-based action recognition, especially for the recognition of the body-motion, human-object and human-human interactions. Nevertheless, how to model and utilize coherent skeleton information comprehensively is still an open problem. In order to capture the rich spatiotemporal information and utilize features more effectively, we introduce a spatial residual layer and a dense connection block enhanced spatial temporal graph convolutional network. More specifically, our work introduces three aspects. Firstly, we extend spatial graph convolution to spatial temporal graph convolution of cross-domain residual to extract more precise and informative spatiotemporal feature, and reduce the training complexity by feature fusion in the, so-called, spatial residual layer. Secondly, instead of simply superimposing multiple similar layers, we use dense connection to take full advantage of the global information. Thirdly, we combine the above mentioned two components to create a spatial temporal graph convolutional network (ST-GCN), referred to as SDGCN. The proposed graph representation has a new structure. We perform extensive experiments on two large datasets: Kinetics and NTU-RGB+D. Our method achieves a great improvement in performance compared to the mainstream methods. We evaluate our method quantitatively and qualitatively, thus proving its effectiveness.

Full Text