Multi-modal neural machine translation with deep semantic interactions

Jinsong Su,Jinchang Chen,Hui Jiang,Chulun Zhou,Huan Lin,Yubin Ge,Qingqiang Wu,Yongxuan Lai

doi:10.1016/j.ins.2020.11.024

Abstract

Based on the conventional attentional encoder-decoder framework, multi-modal neural machine translation (NMT) further incorporates spatial visual features through a separate visual attention mechanism. In this aspect, most current multi-modal NMT models first separately learn the semantic representations of text and image and then independently produce two modalities of context vectors for word predictions, neglecting their semantic interactions. In this paper, we argue that learning text-image semantic interactions is more reasonable in the sense of jointly modeling two modalities for multi-modal NMT and propose a novel multi-modal NMT model with deep semantic interactions. Specifically, our model extends the conventional multi-modal NMT by introducing the following two attention neural networks: (1) a bi-directional attention network for modeling text and image representations, where the semantic representations of text are learned by referring to the image representations, and vice versa; (2) a co-attention network for refining text and image context vectors, which first summarizes the text into a context vector, then attends it to the image for obtaining the text-aware visual context vector. The final context vector is calculated by re-attending the visual context vector to the text. Results on the Multi30k dataset for different language pairs show that our model significantly improves on the state-of-the-art baselines. We have released our code athttps://github.com/DeepLearnXMU/MNMT.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Multi-modal neural machine translation with deep semantic interactions

Abstract

Talk to us

Similar Papers

More From: Information Sciences

Lead the way for us

Journal: Information Sciences	Publication Date: Nov 28, 2020
Citations: 24

Similar Papers

An error analysis for image-based multi-modal neural machine translation
Iacer Calixto ... Qun Liu
Machine Translation | VOL. 33
Iacer Calixto, et. al.Iacer Calixto ... Qun Liu
08 Apr 2019
Machine Translation | VOL. 33

Multimodal Neural Machine Translation Using CNN and Transformer Encoder
Hiroki Takushima ... Takashi Ninomiya
-
Hiroki Takushima, et. al.Hiroki Takushima ... Takashi Ninomiya
02 Apr 2019
02 Apr 2019

Multimodal Machine Translation
Jiatong Liu
IEEE Access | VOL. -
Jiatong LiuJiatong Liu
01 Jan 2024
IEEE Access | VOL. -

Multimodal Neural Machine Translation for English–Assamese Pair
Sahinur Rahman Laskar ... Bishwaraj Paul
-
Sahinur Rahman Laskar, et. al.Sahinur Rahman Laskar ... Bishwaraj Paul
01 Dec 2021
01 Dec 2021

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Multi-modal neural machine translation with deep semantic interactions

Abstract

Talk to us

Similar Papers

More From: Information Sciences