The Flores-101 Evaluation Benchmark for Low-Resource and Multilingual Machine Translation

Naman Goyal,Peng-Jen Chen,Marc’Aurelio Ranzato,Da Ju,Vishrav Chaudhary,Francisco Guzmán,Sanjana Krishnan,Cynthia Gao,Angela Fan,Guillaume Wenzek

doi:10.1162/tacl_a_00474

Abstract

Abstract One of the biggest challenges hindering progress in low-resource and multilingual machine translation is the lack of good evaluation benchmarks. Current evaluation benchmarks either lack good coverage of low-resource languages, consider only restricted domains, or are low quality because they are constructed using semi-automatic procedures. In this work, we introduce the Flores-101 evaluation benchmark, consisting of 3001 sentences extracted from English Wikipedia and covering a variety of different topics and domains. These sentences have been translated in 101 languages by professional translators through a carefully controlled process. The resulting dataset enables better assessment of model quality on the long tail of low-resource languages, including the evaluation of many-to-many multilingual translation systems, as all translations are fully aligned. By publicly releasing such a high-quality and high-coverage dataset, we hope to foster progress in the machine translation community and beyond.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Transactions of the Association for Computational Linguistics	Publication Date: May 4, 2022
Citations: 36	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

The Flores-101 Evaluation Benchmark for Low-Resource and Multilingual Machine Translation

Abstract

Talk to us

Similar Papers

More From: Transactions of the Association for Computational Linguistics

Lead the way for us

Similar Papers

The FLORES-101 Evaluation Benchmark for Low-Resource and Multilingual Machine Translation
...
-
, et. al. ...
07 May 2022
07 May 2022

Multilingual Neural Translation

-

14 Feb 2020
14 Feb 2020

Multilingual machine translation : An analytical study
Madhura Mandar Phadke ... Satish R Devane
-
Madhura Mandar Phadke, et. al.Madhura Mandar Phadke ... Satish R Devane
01 Jun 2017
01 Jun 2017

Balancing Training for Multilingual Neural Machine Translation
Xinyi Wang ... Yulia Tsvetkov
-
Xinyi Wang, et. al.Xinyi Wang ... Yulia Tsvetkov
01 Jan 2020
01 Jan 2020

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

The Flores-101 Evaluation Benchmark for Low-Resource and Multilingual Machine Translation

Abstract

Talk to us

Similar Papers

More From: Transactions of the Association for Computational Linguistics