Micro-expressions, fleeting facial movements lasting 1/25 to 1/3 of a second, offer crucial insights into genuine emotions, particularly valuable in online education settings. The rapid growth of English learning livestreams has heightened the need for accurate, real-time micro-expression recognition to enhance learner engagement and instructional effectiveness. However, existing methods need help with the subtle nature of these expressions, especially in dynamic, low-resolution streaming environments. This paper presents TSG–MER–ELL, a novel end-to-end network for micro-expression recognition in English learning livestreams, integrating temporal–spatial correlation and graph attention mechanisms. The framework addresses the unique challenges of real-time emotion analysis in online language education, where subtle facial cues are crucial in understanding learner engagement and comprehension. The temporal–spatial correlation module employs action units with spatio-temporal graph convolution to aggregate features from diverse facial regions, while transformer encoders construct long-range correlations. The graph attention module builds upon local facial areas to guide self-attention computations, yielding precise local correlation features. These global and local features are fused for the final micro-expression classification. We introduce an adaptive loss function that balances accuracy, efficiency, and relevance to linguistic context. Extensive experiments on SMIC, CASME II, and SAMM datasets, adapted for English learning scenarios, demonstrate TSG–MER–ELL’s superior performance over ten state-of-the-art baselines. The TSG–MER–ELL framework achieves top UF1 and UAR scores across all datasets, significantly improving recognition speed and accuracy. Ablation studies and visualizations of temporal–spatial features and graph attention weights provide insights into the framework’s effectiveness in capturing subtle emotional cues. TSG–MER–ELL’s robust performance in varied online learning conditions highlights its potential to enhance engagement, personalize instruction, and improve overall outcomes in virtual English language education.
Read full abstract