Driver emotion recognition is crucial for enhancing the safety and user experience in driving scenarios. However, current emotion recognition methods often rely solely on a single modality and a single-task setup, leading to suboptimal performance in driving scenarios. To address this, this paper proposes a driver multitask emotion recognition method based on multimodal facial video analysis (MER-MFVA). This method extracts facial expression features and remote photoplethysmography (rPPG) signals from driver facial videos. Facial expression features include facial action units and eye movement information, representing the driver's external characteristics. rPPG information, representing the driver's internal characteristics, is enhanced through a designed dual-path Transformer network and an introduced focus module. We also propose a cross-modal mutual attention computation mechanism to effectively fuse multimodal features by calculating mutual attention between facial expression features and rPPG information. In the final task output, we employ a multitask learning mechanism, setting discrete emotion recognition as the primary task and emotion valence recognition, emotion arousal recognition, and the previous rPPG information extraction as auxiliary tasks to facilitate effective information sharing across different tasks. Experimental results on the established driver emotion dataset demonstrate that our proposed method significantly improves driver emotion recognition performance, achieving an accuracy of 86.98% and an F1 score of 85.83% in the primary task. This validates the effectiveness of the proposed approach.
Read full abstract