Measuring Semantic Preservation in Machine Translation with HCOMET: Human Cognitive Metric for Evaluating Translation

João Marinotti

doi:10.2139/ssrn.3830666

Abstract

Human ranking of machine translation output is a commonly used method for comparing different innovations in machine translation research. Theoretically simple, the comparison of multiple translations is, in effect, cognitively complex, requiring judges to balance the weight of different types of translation errors in the context of the whole sentence. This cognitive complexity is made evident through low intra- and inter- annotator agreements, which call into question the reliability of such ranking metrics. HMEANT (Lo and Wu, 2011) attempted to decrease the complexity of ranking by dividing sentences into smaller semantic units whose translation alignments were more objective, rendering the task cognitively simpler. However, HMEANT does not discern how these semantic units are related and relies heavily on language-dependent verb frames – a significant problem for a translation metric. This project defines a new set of human metrics focusing on HCOMET (Human COgnitive Metric for Evaluating Translation). HCOMET, attempting to overcome the limitations of HMEANT, employed a new cognitively-informed annotation scheme and new scoring guidelines. While the inter-annotator agreement did not surpass that of HMEANT, the conceptual framework of HCOMET allows for a much more detailed analysis of semantic adequacy in machine translation.

Full Text