Balancing Training for Multilingual Neural Machine Translation

Xinyi Wang,Graham Neubig,Yulia Tsvetkov

doi:10.18653/v1/2020.acl-main.754

Abstract

When training multilingual machine translation (MT) models that can translate to/from multiple languages, we are faced with imbalanced training sets: some languages have much more training data than others. Standard practice is to up-sample less resourced languages to increase representation, and the degree of up-sampling has a large effect on the overall performance. In this paper, we propose a method that instead automatically learns how to weight training data through a data scorer that is optimized to maximize performance on all test languages. Experiments on two sets of languages under both one-to-many and many-to-one MT settings show our method not only consistently outperforms heuristic baselines in terms of average performance, but also offers flexible control over the performance of which languages are optimized.

Highlights

Multilingual models are trained to process different languages in a single model, and have been applied to a wide variety of NLP tasks such as text classification (Klementiev et al, 2012; Chen et al, 2018a), syntactic analysis (Plank et al, 2016; Ammar et al, 2016), named-entity recognition (Xie et al, 2018; Wu and Dredze, 2019), and machine translation (MT) (Dong et al, 2015; Johnson et al, 2016)
A common problem with multilingual training is that the data from different languages are both heterogeneous and imbalanced
While lowresource languages (LRLs) will often benefit from transfer from other languages, for languages where sufficient monolingual data exists, performance will often decrease due to interference from the heterogeneous nature of the data

Summary

Introduction

Multilingual models are trained to process different languages in a single model, and have been applied to a wide variety of NLP tasks such as text classification (Klementiev et al, 2012; Chen et al, 2018a), syntactic analysis (Plank et al, 2016; Ammar et al, 2016), named-entity recognition (Xie et al, 2018; Wu and Dredze, 2019), and machine translation (MT) (Dong et al, 2015; Johnson et al, 2016). Arivazhagan et al (2019) find that the exact value of this temperature term significantly affects results, and we further show in experiments that the ideal temperature varies significantly from one experimental setting to another This heuristic ignores factors other than data size that affect the interaction between different languages, despite the fact that language similarity has been empirically proven important in examinations of cross-lingual transfer learning (Wang and Neubig, 2019; Lin et al, 2019). This formulation has no heuristic temperatures, and enables the language scorer to consider the interaction between languages Based on this formulation, we propose an algorithm that improves the ability of DDS to optimize multiple model objectives, which we name MultiDDS. We demonstrate MultiDDS provides a flexible framework that allows the user to define a variety of optimization objectives for multilingual models

Multilingual Training Preliminaries

Differentiable Data Selection

DDS for Multilingual Training

Stabilized Multi-objective Training

Data and Settings

Experiment Setup

Main Results

Prioritizing what to Optimize

Method

Learned Language Distributions

Effect of Stablized Rewards

Related Work

Conclusion machine translation in the wild

Effect of Step-ahead Reward

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Balancing Training for Multilingual Neural Machine Translation

Abstract

Highlights

Summary

Talk to us

Similar Papers

Lead the way for us

Publication Date: Jan 1, 2020
Citations: 92	License type: cc-by

Similar Papers

Multilingual Neural Translation

-

14 Feb 2020
14 Feb 2020

Language relatedness evaluation for multilingual neural machine translation
Chenggang Mi ... Shaoliang Xie
Neurocomputing | VOL. 570
Chenggang Mi, et. al.Chenggang Mi ... Shaoliang Xie
12 Dec 2023
Neurocomputing | VOL. 570

Pre-training Multilingual Neural Machine Translation by Leveraging Alignment Information
Zehui Lin ... Xipeng Qiu
-
Zehui Lin, et. al.Zehui Lin ... Xipeng Qiu
01 Jan 2020
01 Jan 2020

Improving Multilingual Neural Machine Translation with Auxiliary Source Languages
... Haoyang Huang
-
, et. al. ... Haoyang Huang
01 Jan 2020
01 Jan 2020

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Balancing Training for Multilingual Neural Machine Translation

Abstract

Highlights

Summary

Talk to us

Similar Papers