Pre-Trained Word Embedding and Language Model Improve Multimodal Machine Translation: A Case Study in Multi30K

Tosho Hirasawa,Aizhan Imankulova,Masahiro Kaneko,Mamoru Komachi

doi:10.1109/access.2022.3185243

Tosho Hirasawa, Aizhan Imankulova + Show 2 more

Open Access

https://doi.org/10.1109/access.2022.3185243

Copy DOI

Journal: IEEE Access	Publication Date: Jan 1, 2022
Citations: 5	License type: CC BY 4.0

Affiliation: Tokyo Metropolitan University

Abstract

Multimodal machine translation (MMT) is an attractive application of neural machine translation (NMT) that is commonly incorporated with image information. However, the MMT models proposed thus far have only comparable or slightly better performance than their text-only counterparts. One potential cause of this infeasibility is a lack of large-scale data. Most previous studies mitigate this limitation by employing large-scale textual parallel corpora, which are more accessible than multimodal parallel corpora, in various ways. However, these corpora are still available on only a limited scale in low-resource language pairs or domains. In this study, we leveraged monolingual (or multimodal monolingual) corpora, which are available at scale in most languages and domains, to improve MMT models. Our approach follows that of previous unimodal works that use monolingual corpora to train the word embedding or language model and incorporate them into NMT systems. While these methods demonstrated the advantage of using pre-trained representations, there is still room for MMT models to improve. To this end, our system employs debiasing procedures for the word embedding and multimodal extension of the language model (visual-language model, VLM) to make better use of the pre-trained knowledge in the MMT task. The results of evaluations conducted on the de facto MMT dataset for the English–German translation indicate that the improvement obtained using well-tailored word embedding and VLM is approximately +1.84 BLEU and +1.63 BLEU, respectively. The evaluation on multiple language pairs reveals their adoptability across the languages. Beyond the success of our system, we also conducted an extensive analysis on VLM manipulation and showed promising areas for developing better MMT models by exploiting VLM; some benefits brought by either modality are missing, and MMT with VLM generates less fluent translations. Our code is available at https://github.com/toshohirasawa/mmt-with-monolingual-data.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Pre-Trained Word Embedding and Language Model Improve Multimodal Machine Translation: A Case Study in Multi30K

Abstract

Talk to us

Similar Papers

More From: IEEE Access

Lead the way for us

Similar Papers

Mizo Visual Genome 1.0 : A Dataset for English-Mizo Multimodal Neural Machine Translation
Vanlalmuansangi Khenglawt ... Riyanka Manna
-
Vanlalmuansangi Khenglawt, et. al.Vanlalmuansangi Khenglawt ... Riyanka Manna
04 Nov 2022
04 Nov 2022

Neural Machine Translation for Kashmiri to English and Hindi using Pre-trained Embeddings
Shailashree K Sheshadri ... Deepa Gupta
-
Shailashree K Sheshadri, et. al.Shailashree K Sheshadri ... Deepa Gupta
01 Dec 2022
01 Dec 2022

An empirical assessment of different word embedding and deep learning models for bug assignment
Rongcun Wang ... Rubing Huang
The Journal of Systems & Software | VOL. 210
Rongcun Wang, et. al.Rongcun Wang ... Rubing Huang
06 Jan 2024
The Journal of Systems & Software | VOL. 210

Pre-trained Word Embedding based Parallel Text Augmentation Technique for Low-Resource NMT in Favor of Morphologically Rich Languages
Tulu Tilahun Hailu ... Tessfu Geteye Fantaye
-
Tulu Tilahun Hailu, et. al.Tulu Tilahun Hailu ... Tessfu Geteye Fantaye
22 Oct 2019
22 Oct 2019

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Pre-Trained Word Embedding and Language Model Improve Multimodal Machine Translation: A Case Study in Multi30K

Abstract

Talk to us

Similar Papers

More From: IEEE Access