Abstract

Multimodal Emotion Recognition (MER) involves integrating information of various modalities, including audio, visual, text and physiological signals, to comprehensively grasp human sentiments, which has emerged as a vibrant area within human–computer interaction. Researchers have developed many methods for this task, but many of these methods rely on labeled supervised learning and struggle to address the issue of missing some modalities of data. To address these issues, we propose a Multiplex Graph Aggregation and Feature Refinement framework for unsupervised incomplete MER, comprising four modules: Completion, Aggregation, Refinement, and Embedding. Specifically, we first capture the correlation information between samples using the graph structures, which aids in the completion of missing data and the multiplex aggregation of multimodal data. Then, we perform refinement operations on the aggregated features as well as alignment and enhancement operations on the embedding features to obtain the fused feature representations, which are consistent, highly separable and conducive to emotion recognition. Experimental results on multimodal emotion recognition datasets demonstrate that our method achieves state-of-the-art performance among unsupervised methods, validating its effectiveness.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.