Filtering and Extended Vocabulary based Translation for Low-resource Language Pair of Sanskrit-Hindi

Piyush Jha,Vineet Sahula,Rashi Kumar

doi:10.1145/3580495

Abstract

Neural Machine Translation (NMT) is widely employed for language translation tasks because it performs better than the conventional statistical and phrase-based approaches. However, NMT techniques involve challenges, such as requiring a large and clean corpus of parallel data and the inability to deal with rare words. They need to be faster for real-time applications. More work needs to be done using NMT to address the challenges in translating Sanskrit, one of the oldest and rich languages known to the world, with its morphological richness and limited multilingual parallel corpus. There is usually no similar data between a language pair; hence, no application exists so far that can translate Sanskrit to/from other languages. This study presents an in-depth analysis to address these challenges with the help of a low-resource Sanskrit-Hindi language pair. We employ a novel training corpus filtering with extended vocabulary in a zero-shot transformer architecture. The structure of the Sanskrit language is thoroughly investigated to justify the use of each step. Furthermore, the proposed method is analyzed based on variations in sentence length and also applied to a high-resource language pair in order to demonstrate its efficacy.

Full Text

Published version (

Free)

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Filtering and Extended Vocabulary based Translation for Low-resource Language Pair of Sanskrit-Hindi

Abstract

Talk to us

Similar Papers

More From: ACM Transactions on Asian and Low-Resource Language Information Processing

Lead the way for us

Journal: ACM Transactions on Asian and Low-Resource Language Information Processing	Publication Date: Apr 12, 2023
Citations: 1

Similar Papers

Overcoming the Rare Word Problem for low-resource language pairs in Neural Machine Translation
Thi-Vinh Ngo ... Le-Minh Nguyen
-
Thi-Vinh Ngo, et. al.Thi-Vinh Ngo ... Le-Minh Nguyen
01 Jan 2019
01 Jan 2019

Transfer Learning Based Neural Machine Translation of English-Khasi on Low-Resource Settings
Khwairakpam Amitab ... Thoudam Doren Singh
Procedia Computer Science | VOL. 218
Khwairakpam Amitab, et. al.Khwairakpam Amitab ... Thoudam Doren Singh
01 Jan 2023
Procedia Computer Science | VOL. 218

Transfer Learning for Low-Resource Neural Machine Translation
Barret Zoph ... Kevin Knight
-
Barret Zoph, et. al.Barret Zoph ... Kevin Knight
01 Jan 2015
01 Jan 2015

Improving English-Assamese Neural Machine Translation Using Transliteration-Based Approach
Bishwaraj Paul ... Partha Pakray
-
Bishwaraj Paul, et. al.Bishwaraj Paul ... Partha Pakray
01 Jan 2023
01 Jan 2023

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Filtering and Extended Vocabulary based Translation for Low-resource Language Pair of Sanskrit-Hindi

Abstract

Talk to us

Similar Papers

More From: ACM Transactions on Asian and Low-Resource Language Information Processing