ABSTRACTThe quality of education hinges on the proficiency and training of educators. Due to the importance of teacher training, the innovative platform Teacher Moments creates simulated classroom scenarios. In this scenario‐based learning, confusion is an important indicator to detect users who struggle with the simulations. Through Teacher Moments, we gathered 7975 audio recording responses from participants who self‐labelled their recordings according to whether they sounded confused. Our dataset stands out for its size, for not including actor‐generated audio, and for measuring confusion, a neglected emotion in artificial intelligence (AI). Our experiments tested unimodal approaches and feature‐level, model‐level and decision‐level fusion. Feature‐level fusion demonstrated superior performance to unimodal methods, achieving a balanced accuracy of 0.6607 on the test set. This outcome highlights the necessity for further investigation in the overlooked area of confusion detection, particularly employing realistic datasets like the one used in this study and exploring new methods. Beyond teacher training, the insights of this research also extend to other directions, such as other professionals making critical decisions, user interface design or adaptive learning systems.
Read full abstract