Multimodal Machine Translation

Jiatong Liu

doi:10.1109/access.2021.3115135

Jiatong Liu

Open Access

PDF Available

https://doi.org/10.1109/access.2021.3115135

Copy DOI

Export

Save

Cite

Journal: IEEE Access	Publication Date: Jan 1, 2024
Citations: 2	License type: CC BY 4.0

Affiliation: Xiamen University

Abstract
Highlights/Summary
Full-Text PDF
Similar Papers

Abstract

Listen

In recent years, neural network machine translation, especially in the field of multimodality, has developed rapidly. It has been widely used in natural languages processing tasks such as event detection and sentiment classification. The existing multimodal neural network machine translation is mostly based on the autoencoder framework of the attention mechanism, which further integrates spatial-visual features. However, due to the ubiquitous lack of corpus and the semantic interaction between multimodalities, the quality of machine translation is difficult to guarantee. Therefore, this paper proposes a multi-modal machine translation model that integrates external linguistic knowledge. Specifically, on the encoder side, we adopt the pre-trained Bert model to be used as an additional encoder to integrate with the original text encoder and picture encoder. Under the cooperation of the three encoders, a better text representation and picture representation at the source end is generated. Besides, the decoder decodes and generates a translation based on the image and text representation of the source. To sum up, this paper studies the visual-text semantic interaction on the encoder side and the visual-text semantic interaction on the decoder side, and further improves the quality of translation by introducing external linguistic knowledge. We compared the performance of the multimodal neural network machine translation model with pre-trained Bert and other baseline models in English German translation tasks on the multi30k data sets. The results show that the model can significantly improve the quality of multimodal neural network machine translation, which also verifies the importance of integrating external knowledge and visual text semantic interaction.

Highlights

T HE real world that human beings live in is a space where text, sound, image, and video coexist
As Kalchbrenner et al proposed the concept of neural network machine translation in 2013, it soon achieved results comparable to, or even better than, traditional statistical machine translation, and it has gradually become a research hotspot
Ii) Experiments were performed on multiple language pairs on the Multi30k data set, and the results all show that the model in this paper can significantly improve the quality of multimodal neural network machine translation

Summary

INTRODUCTION

T HE real world that human beings live in is a space where text, sound, image, and video coexist. With a large number of pre-training techniques/models, such as: ELMo (Peters et al, 2018), GPT/GPT-2 (Radford et al, 2018) , BERT (Devlinet al., 2019) and cross-language language XLM (Lample & Conneau, 2019), XLNet (yang et al, 2019b) ,RoBERTa (Liu et al, 2019) and other models have refreshed the performance records in the corresponding field time and time again, and the pre-training technology has attracted widespread attention from the machine learning and natural language processing communities These models are pre-trained on a large amount of unlabeled data to better learn the representation of the model input. Multimodal neural network machine translation aims to simultaneously use source language sentences and corresponding visual information to obtain high-quality target language translations In this process, the input sentence and image need to be encoded first, and decoded to generate the target language sentence.

VISUAL ENCODER

BASELINE

VISUAL ANALYSIS

Findings

CONCLUSION AND FUTURE

Full Text

Published Version (Free)

View/Download pdf

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

Multimodal Machine Translation

Abstract

Highlights

Summary

Published Version (Free)

Talk to us

Similar Papers

More From: IEEE Access

Lead the way for us

Similar Papers

Morphology & word sense disambiguation embedded multimodal neural machine translation system between Sanskrit and Malayalam
C Rahul ... R Gopikakumari
Biomedical Signal Processing and Control | VOL. 85
C Rahul, et. al.C Rahul ... R Gopikakumari
30 May 2023
Biomedical Signal Processing and Control | VOL. 85

Multimodal Neural Machine Translation for English–Assamese Pair
Sahinur Rahman Laskar ... Bishwaraj Paul
-
Sahinur Rahman Laskar, et. al.Sahinur Rahman Laskar ... Bishwaraj Paul
01 Dec 2021
01 Dec 2021

Multimodal Neural Machine Translation Using CNN and Transformer Encoder
Hiroki Takushima ... Takashi Ninomiya
-
Hiroki Takushima, et. al.Hiroki Takushima ... Takashi Ninomiya
02 Apr 2019
02 Apr 2019

Mizo Visual Genome 1.0 : A Dataset for English-Mizo Multimodal Neural Machine Translation
Vanlalmuansangi Khenglawt ... Partha Pakray
-
Vanlalmuansangi Khenglawt, et. al.Vanlalmuansangi Khenglawt ... Partha Pakray
04 Nov 2022
04 Nov 2022

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

Multimodal Machine Translation

Abstract

Highlights

Summary

Published Version (Free)

Talk to us

Similar Papers

More From: IEEE Access