Abstract

Recent studies have shown that reinforcement learning (RL) is an effective approach for improving the performance of neural machine translation (NMT) system. However, due to its instability, successfully RL training is challenging, especially in real-world systems where deep models and large datasets are leveraged. In this paper, taking several large-scale translation tasks as testbeds, we conduct a systematic study on how to train better NMT models using reinforcement learning. We provide a comprehensive comparison of several important factors (e.g., baseline reward, reward shaping) in RL training. Furthermore, to fill in the gap that it remains unclear whether RL is still beneficial when monolingual data is used, we propose a new method to leverage RL to further boost the performance of NMT systems trained with source/target monolingual data. By integrating all our findings, we obtain competitive results on WMT14 English-German, WMT17 English-Chinese, and WMT17 Chinese-English translation tasks, especially setting a state-of-the-art performance on WMT17 Chinese-English translation task.

Highlights

  • Neural machine translation (NMT) (Bahdanau et al, 2015; Hassan et al, 2018; Wu et al, 2016; He et al, 2017; Xia et al, 2016, 2017; Wu et al, 2018b,a) has become more and more popular given its superior performance without the demand of heavily hand-crafted engineering efforts

  • Multinomial sampling is better than beam search in reward computation, and the combination of reinforcement learning (RL) and monolingual data significantly enhances the neural machine translation (NMT) model performance

  • We report the performance of the pretrained NMT model with the maximum likelihood estimation (MLE) loss

Read more

Summary

Introduction

Neural machine translation (NMT) (Bahdanau et al, 2015; Hassan et al, 2018; Wu et al, 2016; He et al, 2017; Xia et al, 2016, 2017; Wu et al, 2018b,a) has become more and more popular given its superior performance without the demand of heavily hand-crafted engineering efforts. It is usually trained to maximize the likelihood of each token in the target sentence, by taking the source sentence and the preceding (ground-truth) target tokens as inputs. Such training approach is referred as maximum likelihood estimation (MLE) (Scholz, 1985). A similar method is proposed with the name ‘minimum risk training’ (Shen et al, 2016) All these works demonstrate the effectiveness of RL techniques for NMT models (Wu et al, 2016). Given N training sentence pairs {xi, yi}Ni=1, maximum likelihood estimation (MLE) is usually adopted to optimize the model, and the training objective is defined as: N. The main difference between Transformer and previous RNNSearch (Bahdanau et al, 2015) or ConvS2S (Gehring et al, 2017) is that Transformer relies entirely on self-attention (Lin et al, 2017) to compute representations of source and target side sentences, without using recurrent or convolutional operations

Objectives
Methods
Results
Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.