Abstract

Neural Machine Translation (NMT) is widely employed for language translation tasks because it performs better than the conventional statistical and phrase-based approaches. However, NMT techniques involve challenges, such as requiring a large and clean corpus of parallel data and the inability to deal with rare words. They need to be faster for real-time applications. More work needs to be done using NMT to address the challenges in translating Sanskrit, one of the oldest and rich languages known to the world, with its morphological richness and limited multilingual parallel corpus. There is usually no similar data between a language pair; hence, no application exists so far that can translate Sanskrit to/from other languages. This study presents an in-depth analysis to address these challenges with the help of a low-resource Sanskrit-Hindi language pair. We employ a novel training corpus filtering with extended vocabulary in a zero-shot transformer architecture. The structure of the Sanskrit language is thoroughly investigated to justify the use of each step. Furthermore, the proposed method is analyzed based on variations in sentence length and also applied to a high-resource language pair in order to demonstrate its efficacy.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call