By recognizing students’ facial expressions in actual classroom situations, the students’ emotional states can be quickly uncovered, which can help teachers grasp the students’ learning rate, which allows teachers to adjust their teaching strategies and methods, thus improving the quality and effectiveness of classroom teaching. However, most previous facial expression recognition methods have problems such as missing key facial features and imbalanced class distributions in the dateset, resulting in low recognition accuracy. To address these challenges, this paper proposes LCANet, a model founded on a fused attention mechanism and a joint loss function, which allows the recognition of students’ emotions in real classroom scenarios. The model uses ConvNeXt V2 as the backbone network to optimize the global feature extraction capability of the model, and at the same time, it enables the model to pay closer attention to the key regions in facial expressions. We incorporate an improved Channel Spatial Attention (CSA) module as a way to extract more local feature information. Furthermore, to mitigate the class distribution imbalance problem in the facial expression dataset, we introduce a joint loss function. The experimental results show that our LCANet model has good recognition rates on both the public emotion datasets FERPlus, RAF-DB and AffectNet, with accuracies of 91.43%, 90.03% and 64.43%, respectively, with good robustness and generalizability. Additionally, we conducted experiments using the model in real classroom scenarios, detecting and accurately predicting students’ classroom emotions in real time, which provides an important reference for improving teaching in smart teaching scenarios.
Read full abstract