DiBiMT: A Gold Evaluation Benchmark for Studying Lexical Ambiguity in Machine Translation

Federico Martelli,Tina Munda,Niccolò Campolungo,Stefano Parrella,Carole Tiberius,Roberto Navigli,Svetla Koeva

doi:10.1162/coli_a_00541

Abstract

Abstract Despite the remarkable progress made in the field of Machine Translation (MT), current systems still struggle when translating ambiguous words, especially when these express infrequent meanings. In order to investigate and analyze the impact of lexical ambiguity on automatic translations, several tasks and evaluation benchmarks have been proposed over the course of the last few years. However, work in this research direction suffers from critical shortcomings. Indeed, existing evaluation datasets are not entirely manually curated, which significantly compromises their reliability. Furthermore, current literature fails to provide detailed insights into the nature of the errors produced by models translating ambiguous words, lacking a thorough manual analysis across languages. With a view to overcoming these limitations, we propose Disambiguation Biases in MT (DiBiMT), an entirely manually curated evaluation benchmark for investigating disambiguation biases in eight language combinations and assessing the ability of both commercial and non-commercial systems to handle ambiguous words. We also examine and detail the errors produced by models in this scenario by carrying out a manual error analysis in all language pairs. Additionally, we perform an extensive array of experiments aimed at studying the behavior of models when dealing with ambiguous words. Finally, we show the ineffectiveness of standard MT evaluation settings for assessing the disambiguation capabilities of systems and highlight the need for additional efforts in this research direction and ad-hoc testbeds such as DiBiMT. Our benchmark is available at: https://nlp.uniroma1.it/dibimt/.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

DiBiMT: A Gold Evaluation Benchmark for Studying Lexical Ambiguity in Machine Translation

Abstract

Talk to us

Similar Papers

More From: Computational Linguistics

Lead the way for us

Journal: Computational Linguistics	Publication Date: Dec 2, 2024
License type: CC BY-NC-ND 4.0

Similar Papers

A Naïve Automatic MT Evaluation Method without Reference Translations
Junjie Jiang ... Youfang Lin
-
Junjie Jiang, et. al.Junjie Jiang ... Youfang Lin
01 Jan 2010
01 Jan 2010

Statistical Analysis of Machine Translation Evaluation Systems for English- Hindi Language Pair
Pooja Malik ... Y. Mrudula
Recent Advances in Computer Science and Communications | VOL. 13
Pooja Malik, et. al.Pooja Malik ... Y. Mrudula
05 Nov 2020
Recent Advances in Computer Science and Communications | VOL. 13

Incorporating Machine Learning Techniques in MT Evaluation
Nisheeth Joshi ... Ajai Kumar
-
Nisheeth Joshi, et. al.Nisheeth Joshi ... Ajai Kumar
01 Jan 2015
01 Jan 2015

A comparative analysis of lexical-based automatic evaluation metrics for different Indic language pairs
Kiranjeet Kaur ... Shweta Chauhan
Journal of Autonomous Intelligence | VOL. 7
Kiranjeet Kaur, et. al.Kiranjeet Kaur ... Shweta Chauhan
02 Feb 2024
Journal of Autonomous Intelligence | VOL. 7

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

DiBiMT: A Gold Evaluation Benchmark for Studying Lexical Ambiguity in Machine Translation

Abstract

Talk to us

Similar Papers

More From: Computational Linguistics