Pivot-based Transfer Learning for Neural Machine Translation between Non-English Languages

Yunsu Kim,Petre Petrov,Shahram Khadivi,Hermann Ney,Pavel Petrushkov

doi:10.18653/v1/d19-1080

Abstract

We present effective pre-training strategies for neural machine translation (NMT) using parallel corpora involving a pivot language, i.e., source-pivot and pivot-target, leading to a significant improvement in source-target translation. We propose three methods to increase the relation among source, pivot, and target languages in the pre-training: 1) step-wise training of a single model for different language pairs, 2) additional adapter component to smoothly connect pre-trained encoder and decoder, and 3) cross-lingual encoder training via autoencoding of the pivot language. Our methods greatly outperform multilingual models up to +2.6% BLEU in WMT 2019 French-German and German-Czech tasks. We show that our improvements are valid also in zero-shot/zero-resource scenarios.

Highlights

Machine translation (MT) research is biased towards language pairs including English due to the ease of collecting parallel corpora
We pre-train NMT models for source→pivot and pivot→target, which are transferred to a source→target model
We show that NMT models pre-trained with our methods are highly effective in various data conditions, when fine-tuned for source→target with:

Summary

Introduction

Machine translation (MT) research is biased towards language pairs including English due to the ease of collecting parallel corpora. We present novel transfer learning techniques to effectively train a single, direct NMT model for a non-English language pair. We pre-train NMT models for source→pivot and pivot→target, which are transferred to a source→target model. To optimize the usage of given source-pivot and pivot-target parallel data for the source→target direction, we devise the following techniques to smooth the discrepancy between the pre-trained and final models:. Our methods are evaluated in two non-English language pairs of WMT 2019 news translation tasks: high-resource (French→German) and lowresource (German→Czech). We show that NMT models pre-trained with our methods are highly effective in various data conditions, when fine-tuned for source→target with:.

Related Work

Pivot-based Transfer Learning

Step-wise Pre-training

Pivot Adapter

Cross-lingual Encoder

Main Results

Large-scale Results

Conclusion