Abstract

When building machine translation systems, one often needs to make the best out of heterogeneous sets of parallel data in training, and to robustly handle inputs from unexpected domains in testing. This multi-domain scenario has attracted a lot of recent work that fall under the general umbrella of transfer learning. In this study, we revisit multi-domain machine translation, with the aim to formulate the motivations for developing such systems and the associated expectations with respect to performance. Our experiments with a large sample of multi-domain systems show that most of these expectations are hardly met and suggest that further work is needed to better analyze the current behaviour of multi-domain systems and to make them fully hold their promises.

Highlights

  • Data-based Machine Translation (MT), whether statistical or neural, rests on well-understood machine learning principles

  • We have carefully reconsidered the idea of multi-domain machine translation, which seems to be taken for granted in many recent studies

  • We have designed a series of requirements that MD machine translation (MDMT) systems should meet, and proposed a series of associated test procedures

Read more

Summary

Introduction

Data-based Machine Translation (MT), whether statistical or neural, rests on well-understood machine learning principles. Given a training sample of matched source-target sentence pairs (f , e) drawn from an underlying distribution Ds, a model parameterized by θ (here, a translation function hθ) is trained by minimizing the empirical expectation of a loss function (hθ(f ), e). This approach ensures that the translation loss remains low when translating more sentences drawn from the same distribution. Various techniques exist to handle both the situations where a (small) training sample drawn from Dt is available in training, or where only samples of source-side (or target-side) sentences are available (see Foster and Kuhn [2007]; Bertoldi and Federico [2009]; Axelrod et al [2011]; for proposals from the statistical MT era, or Chu and Wang [2018] for a recent survey of DA for Neural MT)

Objectives
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call