Skeleton-based Action Recognition with Two-Branch Graph Convolutional Networks

Zhi Li ,Qici Xie,Yunhua Lu,Xian Wang

doi:10.1088/1742-6596/2030/1/012091

Zhi Li , Qici Xie + Show 2 more

Open Access

https://doi.org/10.1088/1742-6596/2030/1/012091

Copy DOI

Abstract

Graph convolution is a popular technique for action recognition based on human skeleton data. Due to the fact that human skeleton data can be treated as a graph in three dimensions, Graph Convolutional Networks (GCN) represent the input data as a graph structure to perform the recognition task, and thus numerous approaches based on GCN to recognize actions have achieved great results. In spite of the input data being structured as a four-dimensional tensor in GCN, it still can not fully exploit the contained rich action-related information. Therefore, we proposed a new model that the three-dimensional skeleton data is put into both the Convolutional Neural Network (CNN) and the Graph Convolutional Neural Network branches to perform feature extraction in the spatiotemporal dimension separately, then the output information is fused for prediction. Given the richness of time-domain information characteristics, feature extraction is enhanced by increasing the model’s depth. After the pooling layer and the fully connected layer, we concatenate the outputs at the ends of the graph data stream and the convolution data stream to obtain the network’s final output. Finally, the prediction results can be obtained via the SoftMax layer. On the Kinetic 400 dataset, our suggested model outperforms Benchmark STGCN in terms of accuracy. The experiment results indicate that the proposed novel model successfully increases the generalization ability and classification performance for action recognition.

Full Text