Intermediate Self-supervised Learning for Machine Translation Quality Estimation

Raphael Rubino,Eiichiro Sumita

doi:10.18653/v1/2020.coling-main.385

Abstract

Pre-training sentence encoders is effective in many natural language processing tasks including machine translation (MT) quality estimation (QE), due partly to the scarcity of annotated QE data required for supervised learning. In this paper, we investigate the use of an intermediate self-supervised learning task for sentence encoder aiming at improving QE performances at the sentence and word levels. Our approach is motivated by a problem inherent to QE: mistakes in translation caused by wrongly inserted and deleted tokens. We modify the translation language model (TLM) training objective of the cross-lingual language model (XLM) to orientate the pre-trained model towards the target task. The proposed method does not rely on annotated data and is complementary to QE methods involving pre-trained sentence encoders and domain adaptation. Experiments on English-to-German and English-to-Russian translation directions show that intermediate learning improves over domain adaptated models. Additionally, our method reaches results in par with state-of-the-art QE models without requiring the combination of several approaches and outperforms similar methods based on pre-trained sentence encoders.

Highlights

Machine translation (MT) quality estimation (QE) (Blatz et al, 2003; Quirk, 2004; Specia et al, 2009) aims at evaluating the quality of translation system outputs without relying on translation references, which are required by automatic evaluation metrics such as BLEU (Papineni et al, 2002) or TER (Snover et al, 2006)
Best performing QE methods from the latest WMT QE shared task (Fonseca et al, 2019) are based on two approaches: predictor–estimator (Kim et al, 2017) and QE-specific output layers on top of pre-trained contextual embeddings (Kim et al, 2019). While both approaches make use of sentence encoder models, such as BERT (Devlin et al, 2019) or XLM (Conneau and Lample, 2019), only the second approach allows for straightforward end-to-end learning and direct fine-tuning of the pre-trained language model
All QE results reported in this paper were obtained on the official WMT’19 test set after selecting the best performing models on the validation set according to the official metrics

Summary

Introduction

Machine translation (MT) quality estimation (QE) (Blatz et al, 2003; Quirk, 2004; Specia et al, 2009) aims at evaluating the quality of translation system outputs without relying on translation references, which are required by automatic evaluation metrics such as BLEU (Papineni et al, 2002) or TER (Snover et al, 2006). Best performing QE methods from the latest WMT QE shared task (Fonseca et al, 2019) are based on two approaches: predictor–estimator (Kim et al, 2017) and QE-specific output layers on top of pre-trained contextual embeddings (Kim et al, 2019) While both approaches make use of sentence encoder models, such as BERT (Devlin et al, 2019) or XLM (Conneau and Lample, 2019), only the second approach allows for straightforward end-to-end learning and direct fine-tuning of the pre-trained language model. To provide a smooth transition between pre-training and fine-tuning, an intermediate training step has been proposed (Phang et al, 2018), using large scale labeled data relevant to the target task This approach is limited by its reliance on annotated data for supervised learning. Our contribution focuses on intermediate training of pre-trained LMs for QE and is twofold: evaluating the impact of domain adaptation on pre-trained model and designing a self-supervised learning for intermediate training

Methods

Results

Conclusion