Abstract

Exposing diverse subword segmentations to neural machine translation (NMT) models often improves the robustness of machine translation as NMT models can experience various subword candidates. However, the diversification of subword segmentations mostly relies on the pre-trained subword language models from which erroneous segmentations of unseen words are less likely to be sampled. In this paper, we present adversarial subword regularization (ADVSR) to study whether gradient signals during training can be a substitute criterion for exposing diverse subword segmentations. We experimentally show that our model-based adversarial samples effectively encourage NMT models to be less sensitive to segmentation errors and improve the performance of NMT models in low-resource and out-domain datasets.

Highlights

  • Subword segmentation is a method of segmenting an input sentence into a sequence of subword units (Sennrich et al, 2016; Wu et al, 2016; Kudo, 2018)

  • Our experiment shows that the neural machine translation (NMT) models trained with adversarial subword regularization (ADVSR) improve the performance of baseline NMT models up to 3.2 BLEU scores in IWSLT datasets while outperforming the standard subword regularization method

  • Exposing multiple subword candidates to the NMT models shows superior performance in domain adaptation, which matches the finding from Müller et al (2019)

Read more

Summary

Introduction

Subword segmentation is a method of segmenting an input sentence into a sequence of subword units (Sennrich et al, 2016; Wu et al, 2016; Kudo, 2018). Subword regularization relies on the unigram language models to sample candidates, where the language models are optimized based on the corpus-level statistics from training data with no regard to the translation task objective. This causes NMT models to experience a limited set of subword candidates which are frequently observed in the training data. We adopt the adversarial training framework (Goodfellow et al, 2014; Miyato et al, 2016; Ebrahimi et al, 2017; Cheng et al, 2019) to search for a subword segmentation that effectively regularizes the NMT models. As it is computationally expensive to exactly estimate rin Eq 3, Miyato et al (2016) resort to the linear approximation method (Goodfellow et al, 2014), where ri is approximated as follows:

Background
Approach
Problem Definition
Adversarial Subword Regularization
Experimental Setup
Evaluation
Results on Low-Resource Dataset
Datasets and Implementation Details
Results on Out-Domain Dataset
Results on Synthetic Dataset
Related Work
Conclusions
Details of Training
Details of Experimental Settings
B Sampled Translation Outputs
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call