Sentence Selection and Weighting for Neural Machine Translation Domain Adaptation

Rui Wang,Eiichiro Sumita,Andrew Finch,Masao Utiyama,Lemao Liu,Kehai Chen

doi:10.1109/taslp.2018.2837223

Abstract

Neural machine translation (NMT) has been prominent in many machine translation tasks. However, in some domain-specific tasks, only the corpora from similar domains can improve translation performance. If out-of-domain corpora are directly added into the in-domain corpus, the translation performance may even degrade. Therefore, domain adaptation techniques are essential to solve the NMT domain problem. Most existing methods for domain adaptation are designed for the conventional phrase-based machine translation. For NMT domain adaptation, there have been only a few studies on topics such as fine tuning, domain tags, and domain features. In this paper, we have four goals for sentence level NMT domain adaptation. First, the NMT's internal sentence embedding is exploited and the sentence embedding similarity is used to select out-of-domain sentences that are close to the in-domain corpus. Second, we propose three sentence weighting methods, i.e., sentence weighting, domain weighting, and batch weighting, to balance the data distribution during NMT training. Third, in addition, we propose dynamic training methods to adjust the sentence selection and weighting during NMT training. Fourth, to solve the multidomain problem in a real-world NMT scenario where the domain distributions of training and testing data often mismatch, we proposed a multidomain sentence weighting method to balance the domain distributions of training data and match the domain distributions of training and testing data. The proposed methods are evaluated in international workshop on spoken language translation (IWSLT) English-to-French/German tasks and a multidomain English-to-French task. Empirical results show that the sentence selection and weighting methods can significantly improve the NMT performance, outperforming the existing baselines.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Sentence Selection and Weighting for Neural Machine Translation Domain Adaptation

Abstract

Talk to us

Similar Papers

More From: IEEE/ACM Transactions on Audio, Speech, and Language Processing

Lead the way for us

Journal: IEEE/ACM Transactions on Audio, Speech, and Language Processing	Publication Date: Oct 1, 2018
Citations: 92

Similar Papers

Iterative Training of Unsupervised Neural and Statistical Machine Translation Systems
Benjamin Marie ... Atsushi Fujita
ACM Transactions on Asian and Low-Resource Language Information Processing | VOL. 19
Benjamin Marie, et. al.Benjamin Marie ... Atsushi Fujita
01 Jun 2020
ACM Transactions on Asian and Low-Resource Language Information Processing | VOL. 19

Two Effective Approaches to Data Reduction for Neural Machine Translation: Static and Dynamic Sentence Selection
Xueying Xu ... Shaohui Kuang
-
Xueying Xu, et. al.Xueying Xu ... Shaohui Kuang
01 Nov 2018
01 Nov 2018

Exploring Composite Indexes for Domain Adaptation in Neural Machine Translation
Nhan Vo Minh ... Long H B Nguyen
Vietnam Journal of Computer Science | VOL. 11
Nhan Vo Minh, et. al.Nhan Vo Minh ... Long H B Nguyen
23 Sep 2023
Vietnam Journal of Computer Science | VOL. 11

Addressing domain shift in neural machine translation via reinforcement learning
Amit Kumar ... Sriparna Saha
Expert Systems with Applications | VOL. 201
Amit Kumar, et. al.Amit Kumar ... Sriparna Saha
09 Apr 2022
Expert Systems with Applications | VOL. 201

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Sentence Selection and Weighting for Neural Machine Translation Domain Adaptation

Abstract

Talk to us

Similar Papers

More From: IEEE/ACM Transactions on Audio, Speech, and Language Processing