Region-attentive multimodal neural machine translation

Yuting Zhao,Mamoru Komachi,Tomoyuki Kajiwara,Chenhui Chu

doi:10.1016/j.neucom.2021.12.076

Yuting Zhao, Mamoru Komachi + Show 2 more

Open Access

https://doi.org/10.1016/j.neucom.2021.12.076

Copy DOI

Abstract

We propose a multimodal neural machine translation (MNMT) method with semantic image regions called region-attentive multimodal neural machine translation (RA-NMT). Existing studies on MNMT have mainly focused on employing global visual features or equally sized grid local visual features extracted by convolutional neural networks (CNNs) to improve translation performance. However, they neglect the effect of semantic information captured inside the visual features. This study utilizes semantic image regions extracted by object detection for MNMT and integrates visual and textual features using two modality-dependent attention mechanisms. The proposed method was implemented and verified on two neural architectures of neural machine translation (NMT): recurrent neural network (RNN) and self-attention network (SAN). Experimental results on different language pairs of Multi30k dataset show that our proposed method improves over baselines and outperforms most of the state-of-the-art MNMT methods. Further analysis demonstrates that the proposed method can achieve better translation performance because of its better visual feature use.

Highlights

Neural machine translation (NMT) has achieved state-of-the-art translation performance [43,16,3,45]
Inspired by previous studies [8,13,37,17] on the investigation of the attention mechanism for multi-source learning, we introduce that a region-dependent attention mechanism is a promising way to make Multimodal NMT (MNMT) attend to the salient regions of an image
This demonstrates that the proposed method is universal, which can result in consistent improvements in performance on different NMT architectures

Summary

Introduction

Neural machine translation (NMT) has achieved state-of-the-art translation performance [43,16,3,45]. The strength of NMT lies in its ability to learn directly, in an end-to-end fashion, mapping from the input text to the associated output text. In the context of recurrent neural network (RNN), it was proposed to use internal state (memory) to process variable-length sequences of inputs, that is much better at capturing long-term dependencies [3]. In the context of self-attention network (SAN), a special attention mechanism was proposed for selecting specific parts of an input sequence by relating its elements at different positions, dispensing with recurrence entirely [45]. Many studies [42,21,4] have increasingly been focusing on incorporating visual input, images, to improve translation

Methods

Results

Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Neurocomputing	Publication Date: Jan 3, 2022
Citations: 16	License type: cc-by

R Discovery Prime

R Discovery Prime

Region-attentive multimodal neural machine translation

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Neurocomputing

Lead the way for us

Similar Papers

Multimodal Neural Machine Translation Using CNN and Transformer Encoder
Hiroki Takushima ... Takashi Ninomiya
-
Hiroki Takushima, et. al.Hiroki Takushima ... Takashi Ninomiya
02 Apr 2019
02 Apr 2019

Multimodal Neural Machine Translation for English–Assamese Pair
Sahinur Rahman Laskar ... Bishwaraj Paul
-
Sahinur Rahman Laskar, et. al.Sahinur Rahman Laskar ... Bishwaraj Paul
01 Dec 2021
01 Dec 2021

Sanskrit to Hindi language translation using multimodal neural machine translation
Prashanth Kammar ... Parashuram Baraki
Indonesian Journal of Electrical Engineering and Computer Science | VOL. 34
Prashanth Kammar, et. al.Prashanth Kammar ... Parashuram Baraki
01 May 2024
Indonesian Journal of Electrical Engineering and Computer Science | VOL. 34

Feature-level Incongruence Reduction for Multimodal Translation
Zhifeng Li ... Guodong Zhou
-
Zhifeng Li, et. al.Zhifeng Li ... Guodong Zhou
01 Jan 2020
01 Jan 2020

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Region-attentive multimodal neural machine translation

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Neurocomputing