Abstract

Recently, many studies have shown that integrating multiple modalities can more accurately and robustly identify human emotions compared with a single modality. However, how to fully utilize the heterogeneity and correlation of multiple modalities to improve emotion recognition performance remains a challenge. Given this, we propose a novel multimodal fusion method that considers both heterogeneity and correlation simultaneously, and realizes an end-to-end multimodal emotion recognition model using intra- and inter-modality attention fusion network. Firstly, the dual-stream feature extractor is designed to extract emotional features from raw EEG signals and peripheral physiological signals (PPS) separately. Then, the inter-modality fusion module is released to capture the correlation and complementarity of the two features. Meanwhile, the intra-modality encoding module is added to preserve the heterogeneity information of each feature. Finally, their joint loss function is applied to train the model. The proposed model has been extensively validated on DEAP and DREAMER multimodal datasets, with an average accuracy of 97.97%/98.02% on valence/arousal dimension for DEAP dataset and 99.47%/99.47% on valence/arousal dimension for DREAMER dataset, which outperforms the state-of-the-art multimodal methods. Additionally, we also explore the best combination of EEG and each peripheral physiological signal (e.g., EOG, EMG signals), which can assist in the development of a low-cost and more effective multimodal emotion analysis system. The proposed method can provide new insights into multimodal fusion research for emotion recognition.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call