Abstract

With the invention of deep learning concepts, Machine Translation (MT) migrated towards Neural Machine Translation (NMT) architectures, eventually from Statistical Machine Translation (SMT), which ruled MT for a few decades. Slowly, NMT paved its path into Indian MT research and witnessed many works for various language pairs in this regard. Numerous NMT architectures are floating across the international and national research pool; many claims to be state-of-the-art architectures. Though NMT for Indic languages (ILNMT) is giving better results for majority speaking language pairs, the translation quality is low due to a lack of significant resources. Automated machine translation models are unavailable for some less spoken Indic languages like Kashmiri and Dogri. Hence, there is increasing demand in the research to address the challenges of developing applicable MT models even when minuscule training data is available. Based on the corpus availability, the languages are categorized into High Resource Languages (HRLs), Low Resource Languages (LRLs), and Zero Resource Languages (ZRLs). Many Indic languages are classified into HRLs, LRLs, and ZRLs based on corpus availability. The vision behind this literature survey paper is to make this paper a collective source for all information regarding the predominant ILNMT architectures, the toolkits available for building NMT models, and various pre-trained language models needed by researchers who contribute to the ILNMT research community. In this survey paper, ILNMT architectures for different Indic languages are covered, e.g., Hindi, Tamil (HRLs), Kannada, Marathi (LRLs), Sinhala, and Nepali (ZRLs). There are a few language-specific survey papers on ILNMT, and this is one of the first kinds of survey papers where all the information is gathered under one canopy.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.