Improving Neural Machine Translation for Low Resource Languages Using Mixed Training: The Case of Ethiopian Languages

Atnafu Lambebo Tonja,Grigori Sidorov,Alexander Gelbukh,Olga Kolesnikova,Muhammad Arif

doi:10.1007/978-3-031-19496-2_3

Abstract

AbstractNeural Machine Translation (NMT) has shown improvement for high-resource languages, but there is still a problem with low-resource languages as NMT performs well on huge parallel data available for high-resource languages. In spite of many proposals to solve the problem of low-resource languages, it continues to be a difficult challenge. The issue becomes even more complicated when few resources cover only one domain. In our attempt to combat this issue, we propose a new approach to improve NMT for low-resource languages. The proposed approach using the transformer model shows 5.3, 5.0, and 3.7 BLEU score improvement for Gamo-English, Gofa-English, and Dawuro-English language pairs, respectively, where Gamo, Gofa, and Dawuro are related low-resource Ethiopian languages. We discuss our contributions and envisage future steps in this challenging research area.KeywordsMachine translationLow-resource machine translationNeural machine translationEthiopian languagesMixed training

Full Text