Emotion recognition is critical to understanding students’ emotional states. However, problems such as crowded classroom environments, changing light, and occlusion often affect the accuracy of recognition. This study proposes an emotion recognition algorithm specifically for classroom environments. Firstly, the study adds the self-made MCC module and the Wise-IoU loss function to make object detection in the YOLOv8 model more accurate and efficient. Compared with the native YOL0v8x, it reduces the parameters by 16% and accelerates the inference speed by 20%. Secondly, in order to address the intricacies of the classroom setting and the specific requirements of the emotion recognition task, a multi-channel emotion recognition network (MultiEmoNet) has been developed. This network fuses skeletal, environmental, and facial information, and introduces a central loss function and an attention module AAM to enhance the feature extraction capability. The experimental results show that MultiEmoNet achieves a classification accuracy of 91.4% on a homemade classroom student emotion dataset, which is a 10% improvement over the single-channel classification algorithm. In addition, this study also demonstrates the dynamic changes in students’ emotions in the classroom through visual analysis, which helps teachers grasp students’ emotional states in real time. This paper validates the potential of multi-channel information-fusion deep learning techniques for classroom teaching analysis and provides new ideas and tools for future improvements to emotion recognition techniques.
Read full abstract