Keeping Models Consistent between Pretraining and Translation for Low-Resource Neural Machine Translation

Wenbo Zhang,Rui Dong,Gongxu Luo,Xiao Li,Yating Yang

doi:10.3390/fi12120215

Abstract

Recently, the pretraining of models has been successfully applied to unsupervised and semi-supervised neural machine translation. A cross-lingual language model uses a pretrained masked language model to initialize the encoder and decoder of the translation model, which greatly improves the translation quality. However, because of a mismatch in the number of layers, the pretrained model can only initialize part of the decoder’s parameters. In this paper, we use a layer-wise coordination transformer and a consistent pretraining translation transformer instead of a vanilla transformer as the translation model. The former has only an encoder, and the latter has an encoder and a decoder, but the encoder and decoder have exactly the same parameters. Both models can guarantee that all parameters in the translation model can be initialized by the pretrained model. Experiments on the Chinese–English and English–German datasets show that compared with the vanilla transformer baseline, our models achieve better performance with fewer parameters when the parallel corpus is small.

Highlights

Neural machine translation (NMT), which is trained in an end-to-end fashion [1,2,3,4], has become the mainstream of machine translation methods, and has even reached the human level in some fields [5,6,7]
In order to solve these problems, we propose a new transformer variant based on the vanilla transformer and layer-wise coordination transformer, which is called consistent pretraining translation transformer (CPTT)
The pretrained model shares token embedding between source language and target language, but the NMT model transformer in XLM does not share token embedding between encoder and decoder

Summary

Introduction

Neural machine translation (NMT), which is trained in an end-to-end fashion [1,2,3,4], has become the mainstream of machine translation methods, and has even reached the human level in some fields [5,6,7]. For low-resource semi-supervised neural machine translation, XLM first trains a transformer encoder on both source and target language monolingual data through masked language modeling, and a pretrained model is used to initialize the encoder and decoder of transformer. We still use the mask language modeling as the pretraining task, but we use two transformer variants instead of the vanilla transformer as the translation model One of these translation models is layer-wise coordination transformer [20] and the other is called consistent pretraining translation transformer. In order to keep models consistent between pretraining and translation, we propose to use the layer-wise coordination transformer to replace the vanilla transformer as the translation model. 2. Based on the vanilla transformer and the layer-wise coordination transformer, we propose a consistent pretraining translation transformer, which obtains better performance in the pretraining fine-tuning mode.

Related Works

Background

Transformer-Based NMT

Our Models

Layer-Wise Coordination Transformer

Consistent Pretraining Translation Transformer

Other Model Details

Datasets and Preprocessing

Model Configurations

Results and Analysis

Ablation Study

The Influence of Parallel Corpus Size

Conclusions

Full Text

Published version (

Free)

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Future internet	Publication Date: Nov 27, 2020
Citations: 3	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

Keeping Models Consistent between Pretraining and Translation for Low-Resource Neural Machine Translation

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Future internet

Lead the way for us

Similar Papers

Unsupervised Neural Machine Translation for English to Kannada Using Pre-Trained Language Model
Shailashree K Sheshadri ... Deepa Gupta
-
Shailashree K Sheshadri, et. al.Shailashree K Sheshadri ... Deepa Gupta
03 Oct 2022
03 Oct 2022

Unsupervised Neural Machine Translation With Cross-Lingual Language Representation Agreement
Haipeng Sun ... Rui Wang
IEEE/ACM transactions on audio, speech, and language processing | VOL. 28
Haipeng Sun, et. al.Haipeng Sun ... Rui Wang
01 Jan 2020
IEEE/ACM transactions on audio, speech, and language processing | VOL. 28

Language Model Pre-training Method in Machine Translation Based on Named Entity Recognition
Zhen Li ... Dan Qu
International Journal on Artificial Intelligence Tools | VOL. 29
Zhen Li, et. al.Zhen Li ... Dan Qu
30 Nov 2020
International Journal on Artificial Intelligence Tools | VOL. 29

Iterative Training of Unsupervised Neural and Statistical Machine Translation Systems
Benjamin Marie ... Atsushi Fujita
ACM Transactions on Asian and Low-Resource Language Information Processing | VOL. 19
Benjamin Marie, et. al.Benjamin Marie ... Atsushi Fujita
01 Jun 2020
ACM Transactions on Asian and Low-Resource Language Information Processing | VOL. 19

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Keeping Models Consistent between Pretraining and Translation for Low-Resource Neural Machine Translation

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Future internet