Pre-gating and contextual attention gate — A new fusion method for multi-modal data tasks

Duoyi Zhang,Richi Nayak,Md Abul Bashar

doi:10.1016/j.neunet.2024.106553

Abstract

Multi-modal representation learning has received significant attention across diverse research domains due to its ability to model a scenario comprehensively. Learning the cross-modal interactions is essential to combining multi-modal data into a joint representation. However, conventional cross-attention mechanisms can produce noisy and non-meaningful values in the absence of useful cross-modal interactions among input features, thereby introducing uncertainty into the feature representation. These factors have the potential to degrade the performance of downstream tasks. This paper introduces a novel Pre-gating and Contextual Attention Gate (PCAG) module for multi-modal learning comprising two gating mechanisms that operate at distinct information processing levels within the deep learning model. The first gate filters out interactions that lack informativeness for the downstream task, while the second gate reduces the uncertainty introduced by the cross-attention module. Experimental results on eight multi-modal classification tasks spanning various domains show that the multi-modal fusion model with PCAG outperforms state-of-the-art multi-modal fusion models. Additionally, we elucidate how PCAG effectively processes cross-modality interactions

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Pre-gating and contextual attention gate — A new fusion method for multi-modal data tasks

Abstract

Talk to us

Similar Papers

More From: Neural Networks

Lead the way for us

Journal: Neural Networks	Publication Date: Jul 17, 2024
License type: cc-by

Similar Papers

Mutual Information Regularization for Weakly-Supervised RGB-D Salient Object Detection
Aixuan Li ... Jing Zhang
IEEE Transactions on Circuits and Systems for Video Technology | VOL. 34
Aixuan Li, et. al.Aixuan Li ... Jing Zhang
01 Jan 2024
IEEE Transactions on Circuits and Systems for Video Technology | VOL. 34

Analysis of multimodal data fusion from an information theory perspective
Yinglong Dai ... Guojun Wang
Information Sciences | VOL. 623
Yinglong Dai, et. al.Yinglong Dai ... Guojun Wang
09 Dec 2022
Information Sciences | VOL. 623

Does my multimodal model learn cross-modal interactions? It’s harder to tell than you might think!
Jack Hessel ... Lillian Lee
-
Jack Hessel, et. al.Jack Hessel ... Lillian Lee
01 Jan 2020
01 Jan 2020

Multimodal deep fusion model based on Transformer and multi-layer residuals for assessing the competitiveness of weeds in farmland ecosystems
Zhaoxia Lou ... Zhiming Guo
International Journal of Applied Earth Observation and Geoinformation | VOL. 127
Zhaoxia Lou, et. al.Zhaoxia Lou ... Zhiming Guo
29 Jan 2024
International Journal of Applied Earth Observation and Geoinformation | VOL. 127

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Pre-gating and contextual attention gate — A new fusion method for multi-modal data tasks

Abstract

Talk to us

Similar Papers

More From: Neural Networks