DRDF

Haiwen Hong,Jingfeng Zhang,Yuan He,Hui Xue,Yin Zhang,Yunqing Hu,Xuan Jin

doi:10.1145/3474085.3475702

Abstract

In multimodal tasks, the importance of text and image modal information often varies for different input cases. To model the difference of importance of different modal information, we propose a high-performance and highly general Dual-Router Dynamic Framework (DRDF), consisting of Dual-Router, MWF-Layer, experts and expert fusion unit. The text router and image router in Dual-Router take text modal information and image modal information respectively, and MWF-Layer is responsible to determine the importance of modal information. Based on the result of the determination, MWF-Layer generates fused weights for the subsequent experts fusion. Experts can adopt a variety of backbones that match the current multimodal or unimodal task. DRDF features high generality and modularity, and we test 12 backbones such as Visual BERT and their corresponding DRDF instances on the multimodal dataset Hateful memes, and unimodal datasets CIFAR10, CIFAR100, and TinyImagenet. Our DRDF instance outperforms those backbones. We also validate the effectiveness of components of DRDF by ablation studies, and discuss the reasons and ideas of DRDF design.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

DRDF

Abstract

Talk to us

Similar Papers

Lead the way for us

Similar Papers

A multi-strategy enhanced sine cosine algorithm for global optimization and constrained practical engineering problems
Huiling Chen ... Xuehua Zhao
Applied Mathematics and Computation | VOL. 369
Huiling Chen, et. al.Huiling Chen ... Xuehua Zhao
06 Nov 2019
Applied Mathematics and Computation | VOL. 369

Age differences in emotion recognition: A question of modality?
Cornelia Wieck ... Ute Kunzmann
Psychology and aging | VOL. 32
Cornelia Wieck, et. al.Cornelia Wieck ... Ute Kunzmann
01 Aug 2017
Psychology and aging | VOL. 32

Multi-level correlation mining framework with self-supervised label generation for multimodal sentiment analysis
Zuhe Li ... Ying Xie
Information Fusion | VOL. 99
Zuhe Li, et. al.Zuhe Li ... Ying Xie
16 Jun 2023
Information Fusion | VOL. 99

UNIMO: Towards Unified-Modal Understanding and Generation via Cross-Modal Contrastive Learning
...
-
, et. al. ...
01 Aug 2021
01 Aug 2021

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

DRDF

Abstract

Talk to us

Similar Papers