Neural machine translation for Tamil to English

Minni Jain,Ravneet Punia,Ishika Hooda

doi:10.1080/09720510.2020.1799582

Abstract

The Tamil language is spoken by 80 million people around the world. The translation between Tamil and English leads to a signiﬁcant impact by helping in the understanding of Tamil scripts, which otherwise would be a tedious, costly, and time-consuming process. Thus, developing an automated system to perform Tamil to English translation would save human time and effort. We publicly release a new high-quality corpus for standard training, evaluation, and report results experiments with two different architectures based on Encoder-Decoder to translate Tamil to English. We further tried to improve it by experimenting with pre-trained word embeddings and tuning hyperparameters. Although Google-Translator also provides Tamil to English and vice versa, our implemented architectures, along with the new dataset, completely outperformed the Google Translator with a margin of 7.5 BLEU score. Moreover, our proposed model solves out of vocabulary and polysemy problems up to a greater extent. Our dataset and implementation are available at: https://github.com/Ishikahooda/Tamil-English-Dataset

Full Text