Objective. Physiological signals based emotion recognition is a prominent research domain in the field of human-computer interaction. Previous studies predominantly focused on unimodal data, giving limited attention to the interplay among multiple modalities. Within the scope of multimodal emotion recognition, integrating the information from diverse modalities and leveraging the complementary information are the two essential issues to obtain the robust representations. Approach. Thus, we propose a intermediate fusion strategy for combining low-rank tensor fusion with the cross-modal attention to enhance the fusion of electroencephalogram, electrooculogram, electromyography, and galvanic skin response. Firstly, handcrafted features from distinct modalities are individually fed to corresponding feature extractors to obtain latent features. Subsequently, low-rank tensor is fused to integrate the information by the modality interaction representation. Finally, a cross-modal attention module is employed to explore the potential relationships between the distinct latent features and modality interaction representation, and recalibrate the weights of different modalities. And the resultant representation is adopted for emotion recognition. Main results. Furthermore, to validate the effectiveness of the proposed method, we execute subject-independent experiments within the DEAP dataset. The proposed method has achieved the accuracies of 73.82% and 74.55% for valence and arousal classification. Significance. The results of extensive experiments verify the outstanding performance of the proposed method.
Read full abstract