Hierarchical multimodal-fusion of physiological signals for emotion recognition with scenario adaption and contrastive alignment

Jiehao Tang,Zhuang Ma,Kaiyu Gan,Jianhua Zhang,Zhong Yin

doi:10.1016/j.inffus.2023.102129

Abstract

The lack of complementary affective responses from both the central and peripheral nervous systems could limit the performance of emotion recognition with the single-modal physiological signal. However, when integrating multimodalities, a direct fusion may ignore the heterogeneous nature of multiple feature domains from one modality to another. Besides, there is a risk that the distribution of the multimodal physiological responses may vary across different affective scenarios for stimulating an identical emotional category. The inter-individual variation may also increase due to the superposition of the biometric information from the multimodal features. To tackle these issues, we present a hierarchical multimodal network for robust heterogeneous physiological representations (RHPRNet). First, we applied a spatial-frequency pattern extractor to identify the electroencephalogram (EEG) representations in both the spatial and frequency domains. Next, inter-domain and inter-modality affective encoders are separately applied to the statistic-complexity EEG features and multimodal peripheral features, respectively. All the learned representations are integrated via a hierarchical fusion module. To model the multi-peak patterns stimulated by different affective scenarios, we designed a scenario-adapting pretraining stage. A random contrastive training loss was also applied to mitigate the inter-individual variance. In the end, we performed adequate experiments to examine the performance of the RHPRNet based on three publicly available multimodal databases combined with two validation approaches.

Full Text