Augmenting Neural Machine Translation through Round-Trip Training Approach

Benyamin Ahmadnia,Bonnie J. Dorr

doi:10.1515/comp-2019-0019

Abstract

AbstractThe quality of Neural Machine Translation (NMT), as a data-driven approach, massively depends on quantity, quality and relevance of the training dataset. Such approaches have achieved promising results for bilingually high-resource scenarios but are inadequate for low-resource conditions. Generally, the NMT systems learn from millions of words from bilingual training dataset. However, human labeling process is very costly and time consuming. In this paper, we describe a round-trip training approach to bilingual low-resource NMT that takes advantage of monolingual datasets to address training data bottleneck, thus augmenting translation quality. We conduct detailed experiments on English-Spanish as a high-resource language pair as well as Persian-Spanish as a low-resource language pair. Experimental results show that this competitive approach outperforms the baseline systems and improves translation quality.

Highlights

The quality of Neural Machine Translation (NMT), as a data-driven approach, massively depends on quantity, quality and relevance of the training dataset
The NMT systems learn from millions of words from bilingual training dataset
We describe a round-trip training approach to bilingual low-resource NMT that takes advantage of monolingual datasets to address training data bottleneck, augmenting translation quality