Abstract

We present effective pre-training strategies for neural machine translation (NMT) using parallel corpora involving a pivot language, i.e., source-pivot and pivot-target, leading to a significant improvement in source-target translation. We propose three methods to increase the relation among source, pivot, and target languages in the pre-training: 1) step-wise training of a single model for different language pairs, 2) additional adapter component to smoothly connect pre-trained encoder and decoder, and 3) cross-lingual encoder training via autoencoding of the pivot language. Our methods greatly outperform multilingual models up to +2.6% BLEU in WMT 2019 French-German and German-Czech tasks. We show that our improvements are valid also in zero-shot/zero-resource scenarios.

Highlights

  • Machine translation (MT) research is biased towards language pairs including English due to the ease of collecting parallel corpora

  • We pre-train NMT models for source→pivot and pivot→target, which are transferred to a source→target model

  • We show that NMT models pre-trained with our methods are highly effective in various data conditions, when fine-tuned for source→target with:

Read more

Summary

Introduction

Machine translation (MT) research is biased towards language pairs including English due to the ease of collecting parallel corpora. We present novel transfer learning techniques to effectively train a single, direct NMT model for a non-English language pair. We pre-train NMT models for source→pivot and pivot→target, which are transferred to a source→target model. To optimize the usage of given source-pivot and pivot-target parallel data for the source→target direction, we devise the following techniques to smooth the discrepancy between the pre-trained and final models:. Our methods are evaluated in two non-English language pairs of WMT 2019 news translation tasks: high-resource (French→German) and lowresource (German→Czech). We show that NMT models pre-trained with our methods are highly effective in various data conditions, when fine-tuned for source→target with:.

Related Work
Pivot-based Transfer Learning
Step-wise Pre-training
Pivot Adapter
Cross-lingual Encoder
Main Results
Large-scale Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call