Abstract

The Tamil language is spoken by 80 million people around the world. The translation between Tamil and English leads to a significant impact by helping in the understanding of Tamil scripts, which otherwise would be a tedious, costly, and time-consuming process. Thus, developing an automated system to perform Tamil to English translation would save human time and effort. We publicly release a new high-quality corpus for standard training, evaluation, and report results experiments with two different architectures based on Encoder-Decoder to translate Tamil to English. We further tried to improve it by experimenting with pre-trained word embeddings and tuning hyperparameters. Although Google-Translator also provides Tamil to English and vice versa, our implemented architectures, along with the new dataset, completely outperformed the Google Translator with a margin of 7.5 BLEU score. Moreover, our proposed model solves out of vocabulary and polysemy problems up to a greater extent. Our dataset and implementation are available at: https://github.com/Ishikahooda/Tamil-English-Dataset

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call