Abstract

This work uses sequence-to-sequence (seq2seq) models pre-trained on monolingual corpora for machine translation. We pre-train two seq2seq models with monolingual corpora for the source and target languages, then combine the encoder of the source language model and the decoder of the target language model, i.e., the cross-connection. We add an intermediate layer between the pre-trained encoder and the decoder to help the mapping of each other since the modules are pre-trained completely independently. These monolingual pre-trained models can work as a multilingual pre-trained model because one model can be cross-connected with another model pre-trained on any other language, while their capacity is not affected by the number of languages. We will demonstrate that our method improves the translation performance significantly over the random baseline. Moreover, we will analyze the appropriate choice of the intermediate layer, the importance of each part of a pre-trained model, and the performance change along with the size of the bitext.

Highlights

  • Transfer learning with pre-training and fine-tuning has pushed the state-of-the-art results in many natural language processing (NLP) tasks since the famous success of BERT [1]

  • In the sequence-to-sequence architecture, the encoder is responsible for natural language understanding (NLU) and the decoder is responsible for natural language generation (NLG)

  • The scores of cross-connection models further improve with the additional intermediate layer, which means that the intermediate layer helps to combine the independently trained encoder and decoder to some extent, not as crucial as our initial assumption

Read more

Summary

Introduction

Transfer learning with pre-training and fine-tuning has pushed the state-of-the-art results in many natural language processing (NLP) tasks since the famous success of BERT [1]. This method has become a common practice in the NLP field. The neural machine translation (NMT) task is the case because parallel corpora are expensive to construct and not available for many languages, while monolingual corpora are relatively easier to find and large in scale. The monolingual models are relatively easier to create It would be beneficial if the monolingual models can be reused for the machine translation task

Methods
Results
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call