Abstract First, this paper proposes a multimodal fusion-based human motion recognition technique applied to college physical dance teaching, where feature-level or decision-level fusion maps human kinematic semantic information from datasets to classifiers under a single-level stage. Secondly, a multi-level multimodal fusion approach for human motion recognition is proposed to make it more adaptable to practical application scenarios. The depth data is converted into a depth motion projection map at the input side, and the inertial data is converted into a signal image. Finally, all the modalities are trained by a convolutional neural network for extracting features, and the extracted features are fused at the feature level by discriminant correlation analysis. The results show that the multi-level multimodal fusion framework achieves 99.8% and 99.9% recognition accuracy on the two datasets, respectively, with 100% accuracy in the Throw and Catch action recognition and the lowest recognition rate of 95.36% in the Clap action test, with an average recognition rate of 97.89%, which has a high recognition accuracy. The multi-level multimodal fusion model can obtain movement data close to the actual movement data after optimization in low-precision movement data, which provides data support for physical dance teaching and learning.
Read full abstract