Abstract

We propose the use of pre-trained embeddings as features of a regression model for sentence-level quality estimation of machine translation. In our work we combine freely available BERT and LASER multilingual embeddings to train a neural-based regression model. In the second proposed method we use as an input features not only pre-trained embeddings, but also log probability of any machine translation (MT) system. Both methods are applied to several language pairs and are evaluated both as a classical quality estimation system (predicting the HTER score) as well as an MT metric (predicting human judgements of translation quality).

Highlights

  • Quality estimation (Blatz et al, 2004; Specia et al, 2009) aims to predict the quality of machine translation (MT) outputs without human references, which is what sets it apart from translation metrics like BLEU (Papineni et al, 2002) or TER (Snover et al, 2006)

  • LABEL: embeddings extracted from LASER and BERT and log probability obtained from Transformer NMT model

  • Data We gathered the data from WMT16 - WMT18 shared tasks on sentence-level quality estimation for English-German (En-De) (Bojar et al, 2016a, 2017a; Specia et al, 2018), from WMT17 - WMT18 German-English (De-En) and from WMT 18 English-Czech (En-Cs)

Read more

Summary

Introduction

Quality estimation (Blatz et al, 2004; Specia et al, 2009) aims to predict the quality of machine translation (MT) outputs without human references, which is what sets it apart from translation metrics like BLEU (Papineni et al, 2002) or TER (Snover et al, 2006). Most approaches to quality estimation are trained to predict the post-editing effort, i.e. the number of corrections the translators have to make in order to get an adequate translation. The effort is measured by the HTER metric (Snover et al, 2006) applied to human post-edits. Besides that we apply our method to predict direct human assessment (DA) (Graham et al, 2017). MT metrics (Ma et al, 2018) are compared to DA, but we decided to compare our predictions as well, because there is a difference between a number of post-edits and a human assessment. The main difference between MT metrics and quality estimation is that quality estimation is computing without reference sentences

Architecture
Experimental Settings
Data and Results of HTER Prediction
Results
Data and Results for human assessment prediction
Conclusions

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.