Understanding Flow Experience in Video Learning by Multimodal Data

Yankai Wang,Bing Chen,Hongyan Liu,Zhiguo Hu

doi:10.1080/10447318.2023.2181878

Abstract

Video-based learning has successfully become an effective alternative to face-to-face instruction. In such situations, modeling or predicting learners’ flow experience during video learning is critical for enhancing the learning experience and advancing learning technologies. In this study, we set up an instructional scenario for video learning according to flow theory. Different learning states, i.e., boredom, fit (flow), and anxiety, were successfully induced by varying the difficulty levels of the learning task. We collected learners’ electrocardiogram (ECG) signals as well as facial video, upper body posture and speech data during the learning process. We proposed classification models of the learning state and regression models to predict flow experience by utilizing different combinations of the data from the four modalities. The results showed that the model performance of learning state recognition was significantly improved by the decision-level fusion of multimodal data. By using the selected important features from all data sources, such as the standard deviation of normal to normal R-R intervals (SDNN), high-frequency (HF) heart rate variability and mel-frequency cepstral coefficients (MFCC), the multilayer perceptron (MLP) classifier gave the best recognition result of learning states (i.e., mean AUC of 0.780). The recognition accuracy of boredom, fit (flow) and anxiety reached 47.48%, 80.89% and 47.41%, respectively. For flow experience prediction, the MLP regressor based on the fusion of two modalities (i.e., ECG and posture) achieved the optimal prediction (i.e., mean RMSE of 0.717). This study demonstrates the feasibility of modeling and predicting the flow experience in video learning by combining multimodal data.

Full Text