Abstract

Multimodal sentiment analysis (MSA) has attracted more and more attention in recent years. This paper focuses on the representation learning of multimodal data to reach higher prediction results. We propose a model to assist in learning modality representations with multitask learning and contrastive learning. In addition, our approach obtains dynamic weights by considering the homoscedastic uncertainty of each task in multitask learning. Specially, we design two groups of subtasks, which predict the sentiment polarity of unimodal and bimodal representations, to assist in learning representation through a hard parameter-sharing mechanism in the upstream neural network. A loss weight is learned according to the homoscedastic uncertainty of each task. Moreover, a training strategy based on contrastive learning is designed to balance the inconsistency between training and inference caused by the randomness of the dropout layer. This method minimizes the MSE between two submodels. Experimental results on the MOSI and MOSEI datasets show our method achieves better performance than the current state-of-the-art methods by comprehensively considering the intramodality and intermodality interaction information.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call