Abstract

When training multilingual machine translation (MT) models that can translate to/from multiple languages, we are faced with imbalanced training sets: some languages have much more training data than others. Standard practice is to up-sample less resourced languages to increase representation, and the degree of up-sampling has a large effect on the overall performance. In this paper, we propose a method that instead automatically learns how to weight training data through a data scorer that is optimized to maximize performance on all test languages. Experiments on two sets of languages under both one-to-many and many-to-one MT settings show our method not only consistently outperforms heuristic baselines in terms of average performance, but also offers flexible control over the performance of which languages are optimized.

Highlights

  • Multilingual models are trained to process different languages in a single model, and have been applied to a wide variety of NLP tasks such as text classification (Klementiev et al, 2012; Chen et al, 2018a), syntactic analysis (Plank et al, 2016; Ammar et al, 2016), named-entity recognition (Xie et al, 2018; Wu and Dredze, 2019), and machine translation (MT) (Dong et al, 2015; Johnson et al, 2016)

  • A common problem with multilingual training is that the data from different languages are both heterogeneous and imbalanced

  • While lowresource languages (LRLs) will often benefit from transfer from other languages, for languages where sufficient monolingual data exists, performance will often decrease due to interference from the heterogeneous nature of the data

Read more

Summary

Introduction

Multilingual models are trained to process different languages in a single model, and have been applied to a wide variety of NLP tasks such as text classification (Klementiev et al, 2012; Chen et al, 2018a), syntactic analysis (Plank et al, 2016; Ammar et al, 2016), named-entity recognition (Xie et al, 2018; Wu and Dredze, 2019), and machine translation (MT) (Dong et al, 2015; Johnson et al, 2016). Arivazhagan et al (2019) find that the exact value of this temperature term significantly affects results, and we further show in experiments that the ideal temperature varies significantly from one experimental setting to another This heuristic ignores factors other than data size that affect the interaction between different languages, despite the fact that language similarity has been empirically proven important in examinations of cross-lingual transfer learning (Wang and Neubig, 2019; Lin et al, 2019). This formulation has no heuristic temperatures, and enables the language scorer to consider the interaction between languages Based on this formulation, we propose an algorithm that improves the ability of DDS to optimize multiple model objectives, which we name MultiDDS. We demonstrate MultiDDS provides a flexible framework that allows the user to define a variety of optimization objectives for multilingual models

Multilingual Training Preliminaries
Differentiable Data Selection
DDS for Multilingual Training
Stabilized Multi-objective Training
Data and Settings
Experiment Setup
Main Results
Prioritizing what to Optimize
Method
Learned Language Distributions
Effect of Stablized Rewards
Related Work
Conclusion machine translation in the wild
Effect of Step-ahead Reward
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.