Abstract

A majority of previous methods for multimodal representation learning ignore the rich correlation information inherently stored in each sample, leading to a lack of robustness when trained on small datasets. Although a few contrastive learning frameworks leverage that information in a self-supervised manner, they generally encourage the intra-sample unimodal representations to be identical, neglecting the modality-specific information carried by individual modalities. In contrast, we propose a novel algorithm that learns the correlations between modalities to facilitate downstream multimodal tasks by leveraging the prior information across samples, and we explore the feasibility of the proposed method on elaborately designed unsupervised and supervised auxiliary learning tasks. Specifically, we construct the positive and negative sets for correlation learning as unimodal embeddings from the same sample and from different samples, respectively. A weak predictor is employed on the concatenated unimodal embeddings to learn the correspondence relationship for each set. In this way, the model can correlate unimodal features and discover the shared information across modalities. In contrast to contrastive learning methods, the proposed framework is compatible with any number of modalities and can retain modality-specific information, enabling multimodal representation to capture richer information. Moreover, in the supervised version, one of the main novelties is that the sample labels are further utilized to learn more discriminative features, where the assigned correlation scores of negative sets vary according to the label variations between the associated samples. Extensive experiments suggest that the proposed method reaches state-of-the-art performance on the tasks of multimodal sentiment analysis, emotion recognition, and humor detection, and can improve the performance of various fusion approaches.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call