Self-Guided Curriculum Learning for Neural Machine Translation

Lei Zhou,Shinji Watanabe,Koichi Takeda,Liang Ding,Ryohei Sasano,Kevin Duh

doi:10.18653/v1/2021.iwslt-1.25

Abstract

In supervised learning, a well-trained model should be able to recover ground truth accurately, i.e. the predicted labels are expected to resemble the ground truth labels as much as possible. Inspired by this, we formulate a difficulty criterion based on the recovery degrees of training examples. Motivated by the intuition that after skimming through the training corpus, the neural machine translation (NMT) model “knows” how to schedule a suitable curriculum according to learning difficulty, we propose a self-guided curriculum learning strategy that encourages the NMT model to learn from easy to hard on the basis of recovery degrees. Specifically, we adopt sentence-level BLEU score as the proxy of recovery degree. Experimental results on translation benchmarks including WMT14 English-German and WMT17 Chinese-English demonstrate that our proposed method considerably improves the recovery degree, thus consistently improving the translation performance.

Highlights

Inspired by the learning behavior of humans, Curriculum Learning (CL) for neural network training starts from a basic idea of “starting small”, namely better to start from easier aspects of a task and progress towards aspects with increasing level of difficulty (Elman, 1993)
Training examples with higher recovery degrees are easier to be masted by the neural machine translation (NMT) model while those with lower recovery degrees are likely to be more difficult
Row 5 shows the results of our Transformer BASE implementation, and row 6-7 are the results of our proposed CL models

Summary

Introduction

Inspired by the learning behavior of humans, Curriculum Learning (CL) for neural network training starts from a basic idea of “starting small”, namely better to start from easier aspects of a task and progress towards aspects with increasing level of difficulty (Elman, 1993). Bengio et al (2009) achieves significant performance boost on several tasks by forcing models to learn training examples following an order from “easy” to “difficult”. Bengio et al (2009) achieves significant performance boost on several tasks by forcing models to learn training examples following an order from “easy” to “difficult” They further explain CL method with two important constituents: how to rank training examples by learning difficulty and how to schedule the presentation of training examples based on that rank. In the field of neural machine translation (NMT), empirical studies have shown that CL strategies contribute to both convergence speed and model performance (Zhang et al, 2018; Platanios et al, 2019; Zhang et al, 2019; Liu et al, 2020; Zhan et al, 2021; Ruiter et al, 2020) These CL strategies vary by difficulty criteria and curriculum schedules. Platanios et al (2019) turn discrete numerical difficulty scores into relative probabilities and construct

Methods

Results

Conclusion