Abstract

Sentiment analysis, a subtask of affective computing, endows machines with the ability to sense and comprehend emotions. Recently, research attention has shifted from traditional isolated modality to ubiquitous multi-modalities, requiring to model the complex relationships between modalities and extract the task-relevant information. However, current methods only focus on modality consistency neglecting the complementary relationship and specificity information within each modality. Furthermore, most studies regard multimodal fusion as a mere process of information integration without considering the control of information flow. We propose a variational model that explicitly decomposes unimodal representations into two types, capturing consistency and specificity information, respectively. Following the information bottleneck principle, the implementation optimizes the fusion representation by maximizing its mutual information with consistency and specificity representations while minimizing mutual information with raw inputs. With information integrity constraints and task label supervision, the fusion representation preserves task-relevant information and discards irrelevant noise. Finally, quantitative and qualitative experiments on two benchmark datasets show that our method achieves competitive performance compared to recent baselines, for multimodal sentiment analysis.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call