On the Copying Behaviors of Pre-Training for Neural Machine Translation

Xuebo Liu,Shuming Shi,Longyue Wang,Liang Ding,Lidia S Chao,Derek F Wong,Zhaopeng Tu

doi:10.18653/v1/2021.findings-acl.373

Abstract

Previous studies have shown that initializing neural machine translation (NMT) models with the pre-trained language models (LM) can speed up the model training and boost the model performance. In this work, we identify a critical side-effect of pre-training for NMT, which is due to the discrepancy between the training objectives of LM-based pre-training and NMT. Since the LM objective learns to reconstruct a few source tokens and copy most of them, the pre-training initialization would affect the copying behaviors of NMT models. We provide a quantitative analysis of copying behaviors by introducing a metric called copying ratio, which empirically shows that pre-training based NMT models have a larger copying ratio than the standard one. In response to this problem, we propose a simple and effective method named copying penalty to control the copying behaviors in decoding. Extensive experiments on both in-domain and out-of-domain benchmarks show that the copying penalty method consistently improves translation performance by controlling copying behaviors for pre-training based NMT models. Source code is freely available at this https URL.

Highlights

Self-supervised pre-training (Devlin et al, 2019; Song et al, 2019), which acquires general knowledge from a large amount of unlabeled data to help better and faster learning downstream tasks, has an intuitive appeal for neural machine translation (NMT; Bahdanau et al, 2015; Vaswani et al, 2017)
We find that NMT models with pre-training are prone to generate more copying tokens
We introduce a copying ratio and a copying error rate to quantitatively analyze copying behaviors in NMT evaluation

Summary

Introduction

Self-supervised pre-training (Devlin et al, 2019; Song et al, 2019), which acquires general knowledge from a large amount of unlabeled data to help better and faster learning downstream tasks, has an intuitive appeal for neural machine translation (NMT; Bahdanau et al, 2015; Vaswani et al, 2017). One direct way to utilize pre-trained knowledge is initializing the NMT model with a pre-trained language model (LM) before training it on parallel data (Conneau and Lample, 2019; Liu et al., LM Pre-Training: LPT = − log P (x|x) Source Military Field Marshal Hussein in attendance. NMT Training: LNMT = − log P (y|x) Source Military ruler Field Marshal Hussein. Target Der Militarfuhrer Feldmarschall Hussein Tantawi war anwesend. As a range of surface, syntactic and semantic information has been encoded in the initialized parameters (Jawahar et al, 2019; Goldberg, 2019), they are expected to bring benefits to NMT models and the translation quality

Methods

Results

Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

On the Copying Behaviors of Pre-Training for Neural Machine Translation

Abstract

Highlights

Summary

Talk to us

Similar Papers

Lead the way for us

Publication Date: Jan 1, 2021
Citations: 2	License type: cc-by

Similar Papers

On the Copying Behaviors of Pre-Training for Neural Machine Translation
...
-
, et. al. ...
01 Aug 2021
01 Aug 2021

Adversarial Subword Regularization for Robust Neural Machine Translation
Jungsoo Park ... Jaewoo Kang
-
Jungsoo Park, et. al.Jungsoo Park ... Jaewoo Kang
01 Jan 2020
01 Jan 2020

Translation Transformers Rediscover Inherent Data Domains
...
-
, et. al. ...
21 Oct 2021
21 Oct 2021

Confidence Based Bidirectional Global Context Aware Training Framework for Neural Machine Translation
...
-
, et. al. ...
11 May 2022
11 May 2022

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

On the Copying Behaviors of Pre-Training for Neural Machine Translation

Abstract

Highlights

Summary

Talk to us

Similar Papers