Abstract

The lack of complementary affective responses from both the central and peripheral nervous systems could limit the performance of emotion recognition with the single-modal physiological signal. However, when integrating multimodalities, a direct fusion may ignore the heterogeneous nature of multiple feature domains from one modality to another. Besides, there is a risk that the distribution of the multimodal physiological responses may vary across different affective scenarios for stimulating an identical emotional category. The inter-individual variation may also increase due to the superposition of the biometric information from the multimodal features. To tackle these issues, we present a hierarchical multimodal network for robust heterogeneous physiological representations (RHPRNet). First, we applied a spatial-frequency pattern extractor to identify the electroencephalogram (EEG) representations in both the spatial and frequency domains. Next, inter-domain and inter-modality affective encoders are separately applied to the statistic-complexity EEG features and multimodal peripheral features, respectively. All the learned representations are integrated via a hierarchical fusion module. To model the multi-peak patterns stimulated by different affective scenarios, we designed a scenario-adapting pretraining stage. A random contrastive training loss was also applied to mitigate the inter-individual variance. In the end, we performed adequate experiments to examine the performance of the RHPRNet based on three publicly available multimodal databases combined with two validation approaches.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call