Abstract

Natural language translation is a well-defined task of linguistic technology that minimizes communication gap among people of diverse linguistic backgrounds. Although neural machine translation attains remarkable translational performance, it requires adequate amount of train data, which is a challenging task for low-resource language pair translation. Also, neural machine translation handles rare word problems, i.e., low-frequency words translation at the subword level, but it shows weakness for highly inflected language translation. In this work, we have explored neural machine translation on low-resource English-Assamese language pair with a proposed transliteration approach in the data preprocessing step. In the transliteration approach, the source language is transliterated into target language script that leverages a smaller subword vocabulary for the source-target languages. Moreover, the pre-trained embeddings on the monolingual data of transliterated source and target languages are used in the training process. With our approach, the neural machine translation significantly improves translational performance for English-to-Assamese and Assamese-to-English translation and obtain state-of-the-art results.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call