Reusing Monolingual Pre-Trained Models by Cross-Connecting Seq2seq Models for Machine Translation

Jiun Oh,Yong-Suk Choi

doi:10.3390/app11188737

Abstract

This work uses sequence-to-sequence (seq2seq) models pre-trained on monolingual corpora for machine translation. We pre-train two seq2seq models with monolingual corpora for the source and target languages, then combine the encoder of the source language model and the decoder of the target language model, i.e., the cross-connection. We add an intermediate layer between the pre-trained encoder and the decoder to help the mapping of each other since the modules are pre-trained completely independently. These monolingual pre-trained models can work as a multilingual pre-trained model because one model can be cross-connected with another model pre-trained on any other language, while their capacity is not affected by the number of languages. We will demonstrate that our method improves the translation performance significantly over the random baseline. Moreover, we will analyze the appropriate choice of the intermediate layer, the importance of each part of a pre-trained model, and the performance change along with the size of the bitext.

Highlights

Transfer learning with pre-training and fine-tuning has pushed the state-of-the-art results in many natural language processing (NLP) tasks since the famous success of BERT [1]
In the sequence-to-sequence architecture, the encoder is responsible for natural language understanding (NLU) and the decoder is responsible for natural language generation (NLG)
The scores of cross-connection models further improve with the additional intermediate layer, which means that the intermediate layer helps to combine the independently trained encoder and decoder to some extent, not as crucial as our initial assumption

Summary

Introduction

Transfer learning with pre-training and fine-tuning has pushed the state-of-the-art results in many natural language processing (NLP) tasks since the famous success of BERT [1]. This method has become a common practice in the NLP field. The neural machine translation (NMT) task is the case because parallel corpora are expensive to construct and not available for many languages, while monolingual corpora are relatively easier to find and large in scale. The monolingual models are relatively easier to create It would be beneficial if the monolingual models can be reused for the machine translation task

Methods

Results

Conclusion

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Applied sciences	Publication Date: Sep 19, 2021
Citations: 3	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

Reusing Monolingual Pre-Trained Models by Cross-Connecting Seq2seq Models for Machine Translation

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Applied sciences

Lead the way for us

Similar Papers

Coloring the Blank Slate: Pre-training Imparts a Hierarchical Inductive Bias to Sequence-to-sequence Models
...
-
, et. al. ...
07 May 2022
07 May 2022

Turkish abstractive text summarization using pretrained sequence-to-sequence models
Batuhan Baykara ... Tunga Güngör
Natural language engineering | VOL. 29
Batuhan Baykara, et. al.Batuhan Baykara ... Tunga Güngör
13 May 2022
Natural language engineering | VOL. 29

Cold Fusion: Training Seq2Seq Models Together with Language Models
Anuroop Sriram ... Sanjeev Satheesh
-
Anuroop Sriram, et. al.Anuroop Sriram ... Sanjeev Satheesh
02 Sep 2018
02 Sep 2018

Normalization of Transliterated Mongolian Words Using Seq2Seq Model with Limited Data
Zolzaya Byambadorj ... Ryota Nishimura
ACM Transactions on Asian and Low-Resource Language Information Processing | VOL. 20
Zolzaya Byambadorj, et. al.Zolzaya Byambadorj ... Ryota Nishimura
01 Sep 2021
ACM Transactions on Asian and Low-Resource Language Information Processing | VOL. 20

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Reusing Monolingual Pre-Trained Models by Cross-Connecting Seq2seq Models for Machine Translation

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Applied sciences