Как и какой перевод (не) оценивают компьютеры

Olga V Mitrenina,Aislu G Mukhambetkalieva

doi:10.33910/2687-0215-2021-3-2-77-84

Abstract

The article discusses modern metrics for evaluating the quality of translation used in the development and tuning of MT systems, in machine translation competitions, and in evaluating the performance of some other NLP systems. The authors describe the criteria for evaluating the quality of translation and some methods of expert (human) evaluation. The article also reveals the mechanisms of automatic metrics (such as BLEU, TER, METEOR, BERTScore, COMET), their features, advantages and disadvantages. The authors emphasize the importance of BERTScore and COMET metrics and explain the popularity of some traditional metrics (e.g., BLEU). Modern metrics for the evaluation of translation quality give distorted results when the text contains numerous expressions with indirect meanings: poetic tropes, metaphors, metonymy, humor, or riddles. Communication with indirect meanings is linked with a human ability to think in contradictions. They are a source of insight and were used by Donald Davidson to describe the mechanism of a metaphor. However, communication with indirect meanings is still difficult to computerize. That is why the metric-based evaluation of professional literary translations shows poor results. Further development of metrics should use computer processing of contradictions, possibly with the help of inconsistent logics: paracomplete, paraconsistent and dialethic.

Full Text