Pre-training Multilingual Neural Machine Translation by Leveraging Alignment Information

Zehui Lin,Xiao Pan,Hao Zhou,Jiangtao Feng,Lei Li,Mingxuan Wang,Xipeng Qiu

doi:10.18653/v1/2020.emnlp-main.210

Abstract

We investigate the following question for machine translation (MT): can we develop a single universal MT model to serve as the common seed and obtain derivative and improved models on arbitrary language pairs? We propose mRASP, an approach to pre-train a universal multilingual neural machine translation model. Our key idea in mRASP is its novel technique of random aligned substitution, which brings words and phrases with similar meanings across multiple languages closer in the representation space. We pre-train a mRASP model on 32 language pairs jointly with only public datasets. The model is then fine-tuned on downstream language pairs to obtain specialized MT models. We carry out extensive experiments on 42 translation directions across a diverse settings, including low, medium, rich resource, and as well as transferring to exotic language pairs. Experimental results demonstrate that mRASP achieves significant performance improvement compared to directly training on those target pairs. It is the first time to verify that multiple lowresource language pairs can be utilized to improve rich resource MT. Surprisingly, mRASP is even able to improve the translation quality on exotic languages that never occur in the pretraining corpus. Code, data, and pre-trained models are available at https://github. com/linzehui/mRASP.

Highlights

Pre-trained language models such as BERT have been highly effective for NLP tasks (Peters et al, 2018; Devlin et al, 2019; Radford et al, 2019; Conneau and Lample, 2019; Liu et al, 2019; Yang et al, 2019)
We propose multilingual Random Aligned Substitution Pre-training, a method to pre-train a machine translation (MT) model for many languages, which can be used as a common initial model to fine-tune on arbitrary language pairs. mRASP will improve the translation performance, comparing to the MT models directly trained on downstream parallel data
For extremely low resources setting such as En-Be (Belarusian) where the amount of datasets cannot train an NMT model properly, utilizing the pre-training model boosts performance

Summary

Introduction

Pre-trained language models such as BERT have been highly effective for NLP tasks (Peters et al, 2018; Devlin et al, 2019; Radford et al, 2019; Conneau and Lample, 2019; Liu et al, 2019; Yang et al, 2019). Pre-trained language models such as BERT are not easy to directly fine-tune unless using some sophisticated techniques (Yang et al, 2020). Existing pre-training approaches such as MASS (Song et al, 2019) and mBART (Liu et al, 2020) rely on auto-encoding objectives to pre-train the models, which are different from translation. Their fine-tuned MT models still do not achieve adequate improvement. Existing MT pre-training approaches focus on using multilingual models to improve MT for low resource or medium resource languages. There has not been one pre-trained MT model that can improve for any pairs of languages, even for rich resource settings such as English-French

Objectives

Methods

Results

Conclusion

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Pre-training Multilingual Neural Machine Translation by Leveraging Alignment Information

Abstract

Highlights

Summary

Talk to us

Similar Papers

Lead the way for us

Publication Date: Jan 1, 2020
Citations: 32	License type: cc-by

Similar Papers

Language relatedness evaluation for multilingual neural machine translation
Chenggang Mi ... Shaoliang Xie
Neurocomputing | VOL. 570
Chenggang Mi, et. al.Chenggang Mi ... Shaoliang Xie
12 Dec 2023
Neurocomputing | VOL. 570

Multilingual Neural Machine Translation for Indic to Indic Languages
Sudhansu Bala Das ... Bidyut Kr Patra
ACM Transactions on Asian and Low-Resource Language Information Processing | VOL. 23
Sudhansu Bala Das, et. al.Sudhansu Bala Das ... Bidyut Kr Patra
10 May 2024
ACM Transactions on Asian and Low-Resource Language Information Processing | VOL. 23

Multi-way, multilingual neural machine translation
Orhan Firat ... Yoshua Bengio
Computer Speech & Language | VOL. 45
Orhan Firat, et. al.Orhan Firat ... Yoshua Bengio
10 Nov 2016
Computer Speech & Language | VOL. 45

Multi-Way, Multilingual Neural Machine Translation with a Shared Attention Mechanism
Orhan Firat ... Kyunghyun Cho
-
Orhan Firat, et. al.Orhan Firat ... Kyunghyun Cho
01 Jan 2015
01 Jan 2015

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Pre-training Multilingual Neural Machine Translation by Leveraging Alignment Information

Abstract

Highlights

Summary

Talk to us

Similar Papers