Abstract

We investigate the following question for machine translation (MT): can we develop a single universal MT model to serve as the common seed and obtain derivative and improved models on arbitrary language pairs? We propose mRASP, an approach to pre-train a universal multilingual neural machine translation model. Our key idea in mRASP is its novel technique of random aligned substitution, which brings words and phrases with similar meanings across multiple languages closer in the representation space. We pre-train a mRASP model on 32 language pairs jointly with only public datasets. The model is then fine-tuned on downstream language pairs to obtain specialized MT models. We carry out extensive experiments on 42 translation directions across a diverse settings, including low, medium, rich resource, and as well as transferring to exotic language pairs. Experimental results demonstrate that mRASP achieves significant performance improvement compared to directly training on those target pairs. It is the first time to verify that multiple lowresource language pairs can be utilized to improve rich resource MT. Surprisingly, mRASP is even able to improve the translation quality on exotic languages that never occur in the pretraining corpus. Code, data, and pre-trained models are available at https://github. com/linzehui/mRASP.

Highlights

  • Pre-trained language models such as BERT have been highly effective for NLP tasks (Peters et al, 2018; Devlin et al, 2019; Radford et al, 2019; Conneau and Lample, 2019; Liu et al, 2019; Yang et al, 2019)

  • We propose multilingual Random Aligned Substitution Pre-training, a method to pre-train a machine translation (MT) model for many languages, which can be used as a common initial model to fine-tune on arbitrary language pairs. mRASP will improve the translation performance, comparing to the MT models directly trained on downstream parallel data

  • For extremely low resources setting such as En-Be (Belarusian) where the amount of datasets cannot train an NMT model properly, utilizing the pre-training model boosts performance

Read more

Summary

Introduction

Pre-trained language models such as BERT have been highly effective for NLP tasks (Peters et al, 2018; Devlin et al, 2019; Radford et al, 2019; Conneau and Lample, 2019; Liu et al, 2019; Yang et al, 2019). Pre-trained language models such as BERT are not easy to directly fine-tune unless using some sophisticated techniques (Yang et al, 2020). Existing pre-training approaches such as MASS (Song et al, 2019) and mBART (Liu et al, 2020) rely on auto-encoding objectives to pre-train the models, which are different from translation. Their fine-tuned MT models still do not achieve adequate improvement. Existing MT pre-training approaches focus on using multilingual models to improve MT for low resource or medium resource languages. There has not been one pre-trained MT model that can improve for any pairs of languages, even for rich resource settings such as English-French

Objectives
Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call