An error analysis for image-based multi-modal neural machine translation

Iacer Calixto,Qun Liu

doi:10.1007/s10590-019-09226-9

Abstract

In this article, we conduct an extensive quantitative error analysis of different multi-modal neural machine translation (MNMT) models which integrate visual features into different parts of both the encoder and the decoder. We investigate the scenario where models are trained on an in-domain training data set of parallel sentence pairs with images. We analyse two different types of MNMT models, that use global and local image features: the latter encode an image globally, i.e. there is one feature vector representing an entire image, whereas the former encode spatial information, i.e. there are multiple feature vectors, each encoding different portions of the image. We conduct an error analysis of translations generated by different MNMT models as well as text-only baselines, where we study how multi-modal models compare when translating both visual and non-visual terms. In general, we find that the additional multi-modal signals consistently improve translations, even more so when using simpler MNMT models that use global visual features. We also find that not only translations of terms with a strong visual connotation are improved, but almost all kinds of errors decreased when using multi-modal models.

Highlights

Neural machine translation (NMT) has recently been successfully tackled as a sequence to sequence learning problem (Kalchbrenner and Blunsom 2013; Cho et al 2014; Sutskever et al 2014)
This work aims to provide a comprehensive quantitative error analysis of translations generated with different variants of multi-modal NMT (MNMT) models, the MNMT models introduced in Calixto et al (2017) and Calixto and Liu (2017)
We conducted an extensive error analysis of the translations generated by different baselines, a phrase-based statistical MT model (PBSMT) model and a standard attention-based NMT baseline, and by MNMT models that incorporate images into state-of-the-art attention-based NMT by using images as words in the source sentence, to initialise the encoder’s hidden state, as additional data in the initialisation of the decoder’s hidden state, and by means of an additional independent visual attention mechanism

Summary

Introduction

Neural machine translation (NMT) has recently been successfully tackled as a sequence to sequence (seq2seq) learning problem (Kalchbrenner and Blunsom 2013; Cho et al 2014; Sutskever et al 2014) In this problem, each training example consists of one source and one target variable-length sequence, and there is no prior information regarding the alignments between the two. To mention two rather trivial examples of ambiguity: “The beautiful jaguar is really fast” has an ambiguous noun phrase, and the textual context (“is really fast”) cannot really help disambiguate it; or the classical “The man on the hill saw the boy with a telescope”, which can knowingly have many different interpretations (Church and Patil 1982) In both examples, having an image illustrative of the sentence could be the additional signal that enables the model to arrive at the correct sentence interpretation and translation

Objectives

Findings

Discussion

Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Machine Translation	Publication Date: Apr 8, 2019
Citations: 6	License type: open-access

R Discovery Prime

R Discovery Prime

An error analysis for image-based multi-modal neural machine translation

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Machine Translation

Lead the way for us

Similar Papers

Multimodal Neural Machine Translation Using CNN and Transformer Encoder
Hiroki Takushima ... Takashi Ninomiya
-
Hiroki Takushima, et. al.Hiroki Takushima ... Takashi Ninomiya
02 Apr 2019
02 Apr 2019

Human Evaluation of Multi-modal Neural Machine Translation: A Case-Study on E-Commerce Listing Titles
Iacer Calixto ... Evgeny Matusov
-
Iacer Calixto, et. al.Iacer Calixto ... Evgeny Matusov
01 Jan 2017
01 Jan 2017

Independent Fusion of Words and Image for Multimodal Machine Translation
Junteng Ma ... Minping Chen
-
Junteng Ma, et. al.Junteng Ma ... Minping Chen
01 Jan 2019
01 Jan 2019

Multimodal Neural Machine Translation for English–Assamese Pair
Sahinur Rahman Laskar ... Bishwaraj Paul
-
Sahinur Rahman Laskar, et. al.Sahinur Rahman Laskar ... Bishwaraj Paul
01 Dec 2021
01 Dec 2021

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

An error analysis for image-based multi-modal neural machine translation

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Machine Translation