Applying HMEANT to English-Russian Translations

Alexander Chuchunkov,Alexander Tarelkin,Irina Galinskaya

doi:10.3115/v1/w14-4005

Abstract

In this paper we report the results of first experiments with HMEANT (a semiautomatic evaluation metric that assesses translation utility by matching semantic role fillers) on the Russian language. We developed a web-based annotation interface and with its help evaluated practicability of this metric in the MT research and development process. We studied reliability, language independence, labor cost and discriminatory power of HMEANT by evaluating English-Russian translation of several MT systems. Role labeling and alignment were done by two groups of annotators - with linguistic background and without it. Experimental results were not univocal and changed from very high inter-annotator agreement in role labeling to much lower values at role alignment stage, good correlation of HMEANT with human ranking at the system level significantly decreased at the sentence level. Analysis of experimental results and annotators’ feedback suggests that HMEANT annotation guidelines need some adaptation for Russian.

Highlights

Measuring translation quality is one of the most important tasks in MT, its history began long ago but most of the currently used approaches and metrics have been developed during the last two decades
BLEU (Papineni et al, 2002), NIST (Doddington, 2002) and METEOR (Banerjee and Lavie, 2005)metric require reference translation to compare it with MT output in fully automatic mode, which resulted in a dramatical speed-up for MT research and development
The underlying annotation cycle of HMEANT consists of two stages: semantic role labeling (SRL) and alignment

Summary

Introduction

Measuring translation quality is one of the most important tasks in MT, its history began long ago but most of the currently used approaches and metrics have been developed during the last two decades. BLEU (Papineni et al, 2002), NIST (Doddington, 2002) and METEOR (Banerjee and Lavie, 2005)metric require reference translation to compare it with MT output in fully automatic mode, which resulted in a dramatical speed-up for MT research and development These metrics correlate with manual MT evaluation and provide reliable evaluation for many languages and for different types of MT systems. An alternative approach that is worth mentioning is the one proposed by Snover et al (2006), known as HTER, which measures the quality of machine translation in terms of post-editing This method was proved to correlate well with human adequacy judgments, though it was not designed for a task of gisting. HTER is not widely used in machine translation evaluation because of its high labor intensity

Methods

Results

Discussion

Conclusion