Abstract

The article discusses modern metrics for evaluating the quality of translation used in the development and tuning of MT systems, in machine translation competitions, and in evaluating the performance of some other NLP systems. The authors describe the criteria for evaluating the quality of translation and some methods of expert (human) evaluation. The article also reveals the mechanisms of automatic metrics (such as BLEU, TER, METEOR, BERTScore, COMET), their features, advantages and disadvantages. The authors emphasize the importance of BERTScore and COMET metrics and explain the popularity of some traditional metrics (e.g., BLEU). Modern metrics for the evaluation of translation quality give distorted results when the text contains numerous expressions with indirect meanings: poetic tropes, metaphors, metonymy, humor, or riddles. Communication with indirect meanings is linked with a human ability to think in contradictions. They are a source of insight and were used by Donald Davidson to describe the mechanism of a metaphor. However, communication with indirect meanings is still difficult to computerize. That is why the metric-based evaluation of professional literary translations shows poor results. Further development of metrics should use computer processing of contradictions, possibly with the help of inconsistent logics: paracomplete, paraconsistent and dialethic.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.