Abstract

The joint learning of multimodal is helpful to extract the general information cross-modality in improving the performance of multimodal emotion recognition. However, focusing on a single common pattern can cause multimodal data to deviate from its original distribution and fail to fully capture the potential representation of the data. Therefore, we propose a multi-dimensional homogenous encoding spatial alignment (MHESA) method, which consists of two parts: multi-modal joint learning and modal knowledge transfer. To obtain a common projection space of EEG-EM features, we use a multimodal joint space encoder to learn the homogeneous joint space of EEG-Eye Movement (EM). To obtain a homogeneous encoding space based on modal knowledge, the knowledge transfer module learns the spatial distribution of EM features while retaining the original EEG features. The output of each module is used to construct a multidimensional homogeneous encoding space. The weight and multi-task loss function of the multi-dimensional homogeneous encoding space are dynamically adjusted by the Multi-task Joint Optimization Strategy (MJOS). By analyzing the effect of multi-task optimization, we found that compared with the subject dependence scene, the cross-subject scene has an advantage in the construction of joint encoding space, and the modal knowledge transfer feature has a higher contribution degree in cross-session. The experimental results show the MHESA method can make the model achieve more stable performance in three emotion recognition scenes.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call