Alternated Training with Synthetic and Authentic Data for Neural Machine Translation

Rui Jiao,Yang Liu,Zonghan Yang,Maosong Sun

doi:10.18653/v1/2021.findings-acl.160

Abstract

While synthetic bilingual corpora have demonstrated their effectiveness in low-resource neural machine translation (NMT), adding more synthetic data often deteriorates translation performance. In this work, we propose alternated training with synthetic and authentic data for NMT. The basic idea is to alternate synthetic and authentic corpora iteratively during training. Compared with previous work, we introduce authentic data as guidance to prevent the training of NMT models from being disturbed by noisy synthetic data. Experiments on Chinese-English and German-English translation tasks show that our approach improves the performance over several strong baselines. We visualize the BLEU landscape to further investigate the role of authentic and synthetic data during alternated training. From the visualization, we find that authentic data helps to direct the NMT model parameters towards points with higher BLEU scores and leads to consistent translation performance improvement.

Highlights

While recent years have witnessed the rapid development of Neural Machine Translation (NMT) (Sutskever et al, 2014; Bahdanau et al, 2015; Gehring et al, 2017; Vaswani et al, 2017), it heavily relies on large-scale, high-quality bilingual corpora
We propose alternated training with synthetic and authentic data for neural machine translation
We introduce authentic data as guidance to prevent the training of neural machine translation (NMT) models from being disturbed by noisy synthetic data

Summary

Introduction

While recent years have witnessed the rapid development of Neural Machine Translation (NMT) (Sutskever et al, 2014; Bahdanau et al, 2015; Gehring et al, 2017; Vaswani et al, 2017), it heavily relies on large-scale, high-quality bilingual corpora. One direction to alleviate the problem is to add noise or a special tag on the source side of synthetic data, which enables NMT models to distinguish between authentic and synthetic data (Edunov et al, 2018; Caswell et al, 2019). Another direction is to filter or evaluate the synthetic data by calculating confidence over corpora, making NMT models better exploit synthetic data (Imamura et al, 2018; Wang et al, 2019). Experiments on ChineseEnglish translation tasks show that our approach improves the performance over strong baselines

Alternated Training

Experiments

Results

BLEU Landscape Visualization

Related Work

Conclusion

A Method for Visualization

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Alternated Training with Synthetic and Authentic Data for Neural Machine Translation

Abstract

Highlights

Summary

Talk to us

Similar Papers

Lead the way for us

Publication Date: Jan 1, 2021
Citations: 3	License type: cc-by

Similar Papers

Enhanced Back-Translation for Low Resource Neural Machine Translation Using Self-training
Idris Abdulmumin ... Bashir Shehu Galadanci
-
Idris Abdulmumin, et. al.Idris Abdulmumin ... Bashir Shehu Galadanci
01 Jan 2020
01 Jan 2020

Extremely low-resource neural machine translation for Asian languages
Raphael Rubino ... Benjamin Marie
Machine Translation | VOL. 34
Raphael Rubino, et. al.Raphael Rubino ... Benjamin Marie
01 Dec 2020
Machine Translation | VOL. 34

Combining SMT and NMT Back-Translated Data for Efficient NMT
Alberto Poncelas ... Andy Way
-
Alberto Poncelas, et. al.Alberto Poncelas ... Andy Way
22 Oct 2019
22 Oct 2019

Tag-less back-translation
Idris Abdulmumin ... Garba Aliyu
Machine Translation | VOL. 35
Idris Abdulmumin, et. al.Idris Abdulmumin ... Garba Aliyu
02 Nov 2021
Machine Translation | VOL. 35

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Alternated Training with Synthetic and Authentic Data for Neural Machine Translation

Abstract

Highlights

Summary

Talk to us

Similar Papers