Abstract

In this paper we study the effect of pre-editing rules on the quality of the translations produced by the MT system Lucy LT when translating English news texts into Spanish. We carried out an error annotation of the first 200 segments of the News Crawl: articles from 2014 corpus and devised a set of 8 pre-editing rules. The application of these rules to a different set of segments from the same corpus results in a reduction of the word error rate of about 11%.

Highlights

  • There are two main activities in the automation of the translation process: post-editing and pre-editing

  • In this paper we study the effect of pre-editing rules on the quality of the translations produced by the rule-based machine translation (MT) system Lucy LT4 when translating English news texts into Spanish

  • We devised a set of pre-editing rules to minimise as much as possible the translation errors made by Lucy LT when translating English news texts into Spanish and described in the previous section

Read more

Summary

Introduction

There are two main activities in the automation of the translation process: post-editing and pre-editing. To this end, we annotated the errors found in the first 200 segments of the News Crawl: articles from 20146 corpus using the open-source software translate and the Multidimensional Quality Metrics (MQM; Lommel et al, 2014) framework, which we adapted to our needs. Productivity, that is, the time a post-editor needs to correct an MT output to make it adequate for the intended purpose She tried with and without a customised MT system —enriched with specific terminology— as well as with and without pre-edited texts. Thicke’s study differs from ours in the language pair, the nature of the texts to be translated and the measure used Both studies show a reduction in the post-editing effort when Kohl’s and other pre-editing rules are applied to the source text.

Frameworks for the annotation of errors
Multidimensional Quality Metrics
TAUS Dynamic Quality Framework
Methodology
Error analysis
Pre-editing rules and their evaluation
Acronyms are not always translated
Words composed of EN
The comma before the EN1
There are word order EN
Findings
There are gender and EN
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call