Unsupervised Pivot-based Neural Machine Translation for English to Kannada

Hemalatha Gadugoila,Deepa Gupta,Priyanka C Nair,Shailashree K Sheshadri

doi:10.1109/indicon56171.2022.10039732

Abstract

Neural Machine Translation (NMT) is one of the approaches of Machine Translation (MT) that have marked its significant progress in recent years for Indic languages. Even though a lot of contributions have been done for many Indic language pairs, there is no still such model which can be considered as a benchmark due to scarcity in rich parallel corpus, syntactic, semantic and morphological divergence across languages, difference in sentence ordering and so on. The lack of a rich parallel corpus has created a great hindrance in generating efficient NMT systems. Unsupervised learning based NMT (UNMT) was introduced to cater this concern as it is based on monolingual corpus. In this paper, we propose pivot-based UNMT model for English to Kannada translation with Telugu as pivot language using monolingual corpus of 1 lakh sentences for each language. To further enhance the translation quality, mBART pre-trained model is used. Since partial corpus is taken a conventional UNMT model is built for the comparison of proposed models with state-of-art model. With the conventional UNMT model and proposed architecture, BLEU score of 0.2 and 0.5 is achieved respectively indicating that pivot-based approach with Telugu as pivot enhances the translation accuracy.

Full Text