Adapt and explore: Multimodal mixup for representation learning

Ronghao Lin,Haifeng Hu

doi:10.1016/j.inffus.2023.102216

Abstract

Research on general multimodal systems has gained significant attention due to the proliferation of multimodal data in the real world. Despite the remarkable performance achieved by existing multimodal representation learning schemes, missing modalities remain a persistent issue, thereby limiting the overall applicability of multimodal systems. Intending to address the issue, we propose a novel approach named M3ixup (Multi-Modal Mixup), which leverages the mixup strategy to improve unimodal and multimodal representation learning while simultaneously increasing robustness against missing modalities. First, we adopt productive multimodal learning scheme to model representations with modality-specific and joint-modality encoders. The general scheme ensuring the proposed approach transferable for various multimodal learning scenarios, including supervised, unsupervised, and reinforcement learning. Then, the unimodal input and manifold mixup is used to enhance the modality-specific encoders to capture intra-modal dynamics. Next, we present multimodal mixup to mix different modalities and generate mixed multimodal representations in adapting and exploring steps. The former step aims at bridging the huge information gaps between unimodal and multimodal representations in the joint space in the alignment, while the latter step further captures the inter-modal dynamics and exploits the non-linear relationships among different modalities. After that, the mixed views are aligned with the original multimodal representations by contrastive learning. Additionally, we innovatively extend the mixup strategy to the loss function of multimodal contrastive learning in two steps to improve the alignment between mixed and original views. Extensive experiments on public datasets in various multimodal learning scenarios demonstrate the superiority of the proposed M3ixup. The codes are available at https://github.com/RH-Lin/m3ixup.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Adapt and explore: Multimodal mixup for representation learning

Abstract

Talk to us

Similar Papers

More From: Information Fusion

Lead the way for us

Journal: Information Fusion	Publication Date: Dec 28, 2023
Citations: 2

Similar Papers

Unimodal and Crossmodal Refinement Network for Multimodal Sequence Fusion
Xiaobao Guo ... Xianfeng Wang
-
Xiaobao Guo, et. al.Xiaobao Guo ... Xianfeng Wang
01 Jan 2020
01 Jan 2020

Unimodal and Crossmodal Refinement Network for Multimodal Sequence Fusion

-

15 Oct 2021
15 Oct 2021

Mutual Information Regularization for Weakly-Supervised RGB-D Salient Object Detection
Aixuan Li ... Jing Zhang
IEEE Transactions on Circuits and Systems for Video Technology | VOL. 34
Aixuan Li, et. al.Aixuan Li ... Jing Zhang
01 Jan 2024
IEEE Transactions on Circuits and Systems for Video Technology | VOL. 34

Learning Multimodal Representations by Symmetrically Transferring Local Structures
Bin Dong ... Kai Lu
Symmetry | VOL. 12
Bin Dong, et. al.Bin Dong ... Kai Lu
13 Sep 2020
Symmetry | VOL. 12

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Adapt and explore: Multimodal mixup for representation learning

Abstract

Talk to us

Similar Papers

More From: Information Fusion