Multi-grained visual pivot-guided multi-modal neural machine translation with text-aware cross-modal contrastive disentangling

Junjun Guo,Rui Su,Junjie Ye

doi:10.1016/j.neunet.2024.106403

Abstract

The goal of multi-modal neural machine translation (MNMT) is to incorporate language-agnostic visual information into text to enhance the performance of machine translation. However, due to the inherent differences between image and text, these two modalities inevitably suffer from semantic mismatch problems. To tackle this issue, this paper adopts a multi-grained visual pivot-guided multi-modal fusion strategy with cross-modal contrastive disentangling to eliminate the linguistic gaps between different languages. By using the disentangled multi-grained visual information as a cross-lingual pivot, we can enhance the alignment between different languages and improve the performance of MNMT. We first introduce text-guided stacked cross-modal disentangling modules to progressively disentangle image into two types of visual information: MT-related visual and background information. Then we effectively integrate these two kinds of multi-grained visual elements to assist target sentence generation. Extensive experiments on four benchmark MNMT datasets are conducted, and the results demonstrate that our proposed approach achieves significant improvement over the other state-of-the-art (SOTA) approaches on all test sets. The in-depth analysis highlights the benefits of text-guided cross-modal disentangling and visual pivot-based multi-modal fusion strategies in MNMT. We release the code at https://github.com/nlp-mnmt/ConVisPiv-MNMT.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Multi-grained visual pivot-guided multi-modal neural machine translation with text-aware cross-modal contrastive disentangling

Abstract

Talk to us

Similar Papers

More From: Neural networks : the official journal of the International Neural Network Society

Lead the way for us

Similar Papers

Recovering Permuted Sequential Features for effective Reinforcement Learning.
Yi Jiang ... Houqiang Li
Neural networks : the official journal of the International Neural Network Society | VOL. 182
Yi Jiang, et. al.Yi Jiang ... Houqiang Li
01 Nov 2024
Neural networks : the official journal of the International Neural Network Society | VOL. 182

Multi-grained visual pivot-guided multi-modal neural machine translation with text-aware cross-modal contrastive disentangling
Junjun Guo ... Junjie Ye
Neural networks : the official journal of the International Neural Network Society | VOL. 178
Junjun Guo, et. al.Junjun Guo ... Junjie Ye
23 May 2024
Neural networks : the official journal of the International Neural Network Society | VOL. 178

Distillation of multi-class cervical lesion cell detection via synthesis-aided pre-training and patch-level feature alignment
Manman Fei ... Xin Wang
Neural networks : the official journal of the International Neural Network Society | VOL. 178
Manman Fei, et. al.Manman Fei ... Xin Wang
22 May 2024
Neural networks : the official journal of the International Neural Network Society | VOL. 178

Multi-tailed vision transformer for efficient inference
Yunke Wang ... Chang Xu
Neural networks : the official journal of the International Neural Network Society | VOL. 174
Yunke Wang, et. al.Yunke Wang ... Chang Xu
14 Mar 2024
Neural networks : the official journal of the International Neural Network Society | VOL. 174

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Multi-grained visual pivot-guided multi-modal neural machine translation with text-aware cross-modal contrastive disentangling

Abstract

Talk to us

Similar Papers

More From: Neural networks : the official journal of the International Neural Network Society