Keep It or Not: Word Level Quality Estimation for Post-Editing

Prasenjit Basu,Santanu Pal,Sudip Kumar Naskar

doi:10.18653/v1/w18-6457

Abstract

The paper presents our participation in the WMT 2018 shared task on word level quality estimation (QE) of machine translated (MT) text, i.e., to predict whether a word in MT output for a given source context is correctly translated and hence should be retained in the post-edited translation (PE), or not. To perform the QE task, we measure the similarity of the source context of the target MT word with the context for which the word is retained in PE in the training data. This is achieved in two different ways, using Bag-of-Words (BoW) model and Document-to-Vector (Doc2Vec) model. In the BoW model, we compute the cosine similarity while in the Doc2Vec model we consider the Doc2Vec similarity. By applying the Kneedle algorithm on the F1mult vs. similarity score plot, we derive the threshold based on which OK/BAD decisions are taken for the MT words. Experimental results revealed that the Doc2Vec model performs better than the BoW model on the word level QE task.

Highlights

Evaluating and estimating quality of a machine translation (MT) system without referring the actual translation is one of the key research areas in MT domain (Blatz et al, 2004; Specia et al, 2009)
One model used discourse features and SVR and another model employed word embedding feature and Gaussian Process for quality estimation. (Bicici, 2017) predicted translation performance with referential translation machines at word level, sentence level and at phrase level. (Blain et al, 2017) submitted task on bi-lexical word embedding in WMT17 QE shared task, which produced promising results in sentence level Quality Estimation
The paper reports our participation in the WMT 2018 shared task on word level quality estimation (QE task2) on English–German SMT data

Summary

Introduction

Evaluating and estimating quality of a machine translation (MT) system without referring the actual translation is one of the key research areas in MT domain (Blatz et al, 2004; Specia et al, 2009). In a machine translated document quality estimation can be performed at various granularities like word level, phrase level or sentence level (Specia et al, 2010, 2013). Scarton et al (2016) produced their task in WMT16 in document level quality estimation with winning result in two different models (Bojar et al, 2016). (Blain et al, 2017) submitted task on bi-lexical word embedding in WMT17 QE shared task, which produced promising results in sentence level Quality Estimation. Bengio et al (2003) proposed neural probabilistic language model by using a distributed representation of words. (Mikolov et al, 2013a) proposed a novel approach to represent words as fixed length vectors, widely known as word2vec model and they reported state-of-theart performance on word similarity task. Their work showed prediction of a word from a context by adding two word vectors from the same context. (Mikolov et al, 2013a) proposed a novel approach to represent words as fixed length vectors, widely known as word2vec model and they reported state-of-theart performance on word similarity task. (Le and Mikolov, 2014) extend their model to vector representation of a document known as Paragraph Vector model or commonly Document-to-Vector (Doc2Vec) model

Objectives

Methods

Findings

Conclusion

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Keep It or Not: Word Level Quality Estimation for Post-Editing

Abstract

Highlights

Summary

Talk to us

Similar Papers

Lead the way for us

Publication Date: Jan 1, 2018
Citations: 18	License type: cc-by

Similar Papers

Investigating the Helpfulness of Word-Level Quality Estimation for Post-Editing Machine Translation Output
Raksha Shenoy ... Josef Van Genabith
-
Raksha Shenoy, et. al.Raksha Shenoy ... Josef Van Genabith
01 Jan 2020
01 Jan 2020

Investigating the Helpfulness of Word-Level Quality Estimation for Post-Editing Machine Translation Output
...
-
, et. al. ...
21 Oct 2021
21 Oct 2021

Uniformly Interpolated Balancing for Robust Prediction in Translation Quality Estimation
Hyun Kim ... Seung-Hoon Na
ACM Transactions on Asian and Low-Resource Language Information Processing | VOL. 19
Hyun Kim, et. al.Hyun Kim ... Seung-Hoon Na
19 Jan 2020
ACM Transactions on Asian and Low-Resource Language Information Processing | VOL. 19

Estimating post-editing time using a gold-standard set of machine translation errors
Arda Tezcan ... Lieve Macken
Computer Speech & Language | VOL. 55
Arda Tezcan, et. al.Arda Tezcan ... Lieve Macken
08 Nov 2018
Computer Speech & Language | VOL. 55

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Keep It or Not: Word Level Quality Estimation for Post-Editing

Abstract

Highlights

Summary

Talk to us

Similar Papers