Abstract

Active learning (AL) for machine translation (MT) has been well-studied for the phrase-based MT paradigm. Several AL algorithms for data sampling have been proposed over the years. However, given the rapid advancement in neural methods, these algorithms have not been thoroughly investigated in the context of neural MT (NMT). In this work, we address this missing aspect by conducting a systematic comparison of different AL methods in a simulated AL framework. Our experimental setup to compare different AL methods uses: i) State-of-the-art NMT architecture to achieve realistic results; and ii) the same dataset (WMT’13 English-Spanish) to have fair comparison across different methods. We then demonstrate how recent advancements in unsupervised pre-training and paraphrastic embedding can be used to improve existing AL methods. Finally, we propose a neural extension for an AL sampling method used in the context of phrase-based MT - Round Trip Translation Likelihood (RTTL). RTTL uses a bidirectional translation model to estimate the loss of information during translation and outperforms previous methods.

Highlights

  • Active learning (AL) is an iterative supervised learning procedure where the learner is able to query an oracle for labeling new data points

  • We evaluate the AL approaches on BLEU, TER (Snover et al, 2006), which is based on edit distance, and BEER (Stanojevicand Sima’an, 2015), which uses a linear model trained on human evaluations dataset

  • We performed an empirical evaluation of different AL methods for the state-ofthe-art neural machine translation (MT) architecture, a missing aspect in prior work

Read more

Summary

Introduction

Active learning (AL) is an iterative supervised learning procedure where the learner is able to query an oracle for labeling new data points. The few recently published papers in this direction (Peris and Casacuberta, 2018; Zhang et al, 2018) use LSTM-based MT systems, whereas, the latest state-of-the-art systems are based on the Transformer architecture (Vaswani et al, 2017). These papers either investigate different algorithms of the same class or compare only a handful of methods from different classes. A global picture showing the effect of different AL methods on the same dataset for the state-of-the-art (SotA) MT system has been missing

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call