Pre-Training on Mixed Data for Low-Resource Neural Machine Translation

Wenbo Zhang,Xiao Li,Rui Dong,Yating Yang

doi:10.3390/info12030133

Abstract

The pre-training fine-tuning mode has been shown to be effective for low resource neural machine translation. In this mode, pre-training models trained on monolingual data are used to initiate translation models to transfer knowledge from monolingual data into translation models. In recent years, pre-training models usually take sentences with randomly masked words as input, and are trained by predicting these masked words based on unmasked words. In this paper, we propose a new pre-training method that still predicts masked words, but randomly replaces some of the unmasked words in the input with their translation words in another language. The translation words are from bilingual data, so that the data for pre-training contains both monolingual data and bilingual data. We conduct experiments on Uyghur-Chinese corpus to evaluate our method. The experimental results show that our method can make the pre-training model have a better generalization ability and help the translation model to achieve better performance. Through a word translation task, we also demonstrate that our method enables the embedding of the translation model to acquire more alignment knowledge.

Highlights

In recent years, neural machine translation (NMT) has achieved rapid development [1,2,3]
NMT has reached the level of statistical machine translation (SMT)
We propose a simple word translation model through which we demonstrate that our method can help the embedding of the translation model to acquire more alignment knowledge

Summary

Introduction

Neural machine translation (NMT) has achieved rapid development [1,2,3]. An NMT model is usually based on the encoder-decoder architecture. In early models of NMT, the encoder converts a variable length source language sentence into a fixed-length context vector, the decoder generates target language words one by one from the fixed context vector [4]. After the emergence of the attention mechanism [5,6], the output of the encoder is no longer a fixed-length context vector, but multiple context vectors of the same length as the input, and the decoder generates target language words according to the variable context vector which is a weighted sum of the multiple context vectors. Some studies even claim that their NMT system has achieved human parity in some domains for some languages [3]

Methods

Results

Conclusion

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Information	Publication Date: Mar 18, 2021
Citations: 4	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

Pre-Training on Mixed Data for Low-Resource Neural Machine Translation

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Information

Lead the way for us

Similar Papers

Fine-Tuning Self-Supervised Multilingual Sequence-To-Sequence Models for Extremely Low-Resource NMT
Sarubi Thillainathan ... Surangika Ranathunga
-
Sarubi Thillainathan, et. al.Sarubi Thillainathan ... Surangika Ranathunga
27 Jul 2021
27 Jul 2021

Leveraging Monolingual Data with Self-Supervision for Multilingual Neural Machine Translation
Aditya Siddhant ... Naveen Arivazhagan
-
Aditya Siddhant, et. al.Aditya Siddhant ... Naveen Arivazhagan
01 Jan 2020
01 Jan 2020

Enhanced Back-Translation for Low Resource Neural Machine Translation Using Self-training
Idris Abdulmumin ... Bashir Shehu Galadanci
-
Idris Abdulmumin, et. al.Idris Abdulmumin ... Bashir Shehu Galadanci
01 Jan 2020
01 Jan 2020

Spanish-Turkish Low-Resource Machine Translation: Unsupervised Learning vs Round-Tripping
Tianyi Xu ... Ozge Ilkim Ozbek
American Journal of Artificial Intelligence | VOL. 4
Tianyi Xu, et. al.Tianyi Xu ... Ozge Ilkim Ozbek
01 Jan 2020
American Journal of Artificial Intelligence | VOL. 4

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Pre-Training on Mixed Data for Low-Resource Neural Machine Translation

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Information