Abstract

Natural language translation is a well-defined task of linguistic technology that minimizes communication gap among people of diverse linguistic backgrounds. Although neural machine translation attains remarkable translational performance, it requires adequate amount of train data, which is a challenging task for low-resource language pair translation. Also, neural machine translation handles rare word problems, i.e., low-frequency words translation at the subword level, but it shows weakness for highly inflected language translation. In this work, we have explored neural machine translation on low-resource English-Assamese language pair with a proposed transliteration approach in the data preprocessing step. In the transliteration approach, the source language is transliterated into target language script that leverages a smaller subword vocabulary for the source-target languages. Moreover, the pre-trained embeddings on the monolingual data of transliterated source and target languages are used in the training process. With our approach, the neural machine translation significantly improves translational performance for English-to-Assamese and Assamese-to-English translation and obtain state-of-the-art results.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.