Abstract

Action recognition is of great significance in the field of machine vision. In recent years, great progress has been made in bone point-based action recognition models, but there is no much research on weak feature extraction of bone, leading to insufficient generalization of the trained models. This experiment proposes to use the Transformer structure and its attention mechanism to extract image features as input to Transformer to capture their behavior after extraction via GCN. Furthermore, the experiments were optimized based on the original ST-GCN model, introducing an adaptive graph convolutional layer to increase its flexibility and add attention mechanisms to a separate spatiotemporal channel module to further enhance the adaptive graph convolutional layer. Experiments on the NTU-RGBD dataset show that the model shows some improvement in the accuracy of action recognition.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call