Abstract

Graph convolution is a popular technique for action recognition based on human skeleton data. Due to the fact that human skeleton data can be treated as a graph in three dimensions, Graph Convolutional Networks (GCN) represent the input data as a graph structure to perform the recognition task, and thus numerous approaches based on GCN to recognize actions have achieved great results. In spite of the input data being structured as a four-dimensional tensor in GCN, it still can not fully exploit the contained rich action-related information. Therefore, we proposed a new model that the three-dimensional skeleton data is put into both the Convolutional Neural Network (CNN) and the Graph Convolutional Neural Network branches to perform feature extraction in the spatiotemporal dimension separately, then the output information is fused for prediction. Given the richness of time-domain information characteristics, feature extraction is enhanced by increasing the model’s depth. After the pooling layer and the fully connected layer, we concatenate the outputs at the ends of the graph data stream and the convolution data stream to obtain the network’s final output. Finally, the prediction results can be obtained via the SoftMax layer. On the Kinetic 400 dataset, our suggested model outperforms Benchmark STGCN in terms of accuracy. The experiment results indicate that the proposed novel model successfully increases the generalization ability and classification performance for action recognition.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call