Human action recognition, a pivotal topic in computer vision, is a highly complex and challenging task. It requires the analysis of not only spatial dependencies of targets but also temporal changes in these targets. In recent decades, the advancement of deep learning has led to the development of numerous action recognition methods based on deep neural networks. Given that the skeleton points of the human body can be treated as a graph structure, graph neural networks (GNNs) have emerged as an effective tool for modeling such data, garnering significant interest from researchers. This paper aims to address the issue of low test speed caused by over-complicated deep graph convolutional models. To achieve this, we compress the network structure using knowledge distillation from a teacher-student architecture, leading to a compact and lightweight student GNN. To enhance the model’s robustness and generalization capabilities, we introduce a data augmentation mechanism that generates diverse action sequences while maintaining consistent behavior labels, thereby providing a more comprehensive learning basis for the model. The proposed model integrates three distinct knowledge learning paths: teacher networks, original datasets, and derived data. The fusion of knowledge distillation and data augmentation enables lightweight student networks to outperform their teacher networks in terms of both performance and efficiency. Experimental results demonstrate the efficacy of our approach in the context of skeleton-based human action recognition, highlighting its potential to simplify state-of-the-art models while enhancing their performance.
Read full abstract