Improving Multilingual Neural Machine Translation System for Indic Languages

Sudhansu Bala Das,Atharv Biradar,Bidyut Kr Patra,Tapas Kumar Mishra

doi:10.1145/3587932

Abstract

The Machine Translation System (MTS) serves as effective tool for communication by translating text or speech from one language to another language. Recently, neural machine translation (NMT) has become popular for its performance and cost-effectiveness. However, NMT systems are restricted in translating low-resource languages as a huge quantity of data is required to learn useful mappings across languages. The need for an efficient translation system becomes obvious in a large multilingual environment like India. Indian languages (ILs) are still treated as low-resource languages due to unavailability of corpora. In order to address such an asymmetric nature, the multilingual neural machine translation (MNMT) system evolves as an ideal approach in this direction. The MNMT converts many languages using a single model, which is extremely useful in terms of training process and lowering online maintenance costs. It is also helpful for improving low-resource translation. In this article, we propose an MNMT system to address the issues related to low-resource language translation. Our model comprises two MNMT systems, i.e., for English-Indic (one-to-many) and for Indic-English (many-to-one) with a shared encoder-decoder containing 15 language pairs (30 translation directions). Since most of IL pairs have a scanty amount of parallel corpora, not sufficient for training any machine translation model, we explore various augmentation strategies to improve overall translation quality through the proposed model. A state-of-the-art transformer architecture is used to realize the proposed model. In addition, the article addresses the use of language relationships (in terms of dialect, script, etc.), particularly about the role of high-resource languages of the same family in boosting the performance of low-resource languages. Moreover, the experimental results also show the advantage of back-translation and domain adaptation for ILs to enhance the translation quality of both source and target languages. Using all these key approaches, our proposed model emerges to be more efficient than the baseline model in terms of evaluation metrics, i.e., BLEU (BiLingual Evaluation Understudy) score for a set of ILs.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Improving Multilingual Neural Machine Translation System for Indic Languages

Abstract

Talk to us

Similar Papers

More From: ACM Transactions on Asian and Low-Resource Language Information Processing

Lead the way for us

Journal: ACM Transactions on Asian and Low-Resource Language Information Processing	Publication Date: Jun 16, 2023
Citations: 10

Similar Papers

Baidu Translate: Research and Products
Zhongjun He
-
Zhongjun HeZhongjun He
01 Jan 2015
01 Jan 2015

Low Resource Neural Machine Translation: Assamese to/from Other Indo-Aryan (Indic) Languages
Rupjyoti Baruah ... Rajesh Kumar Mundotiya
ACM Transactions on Asian and Low-Resource Language Information Processing | VOL. 21
Rupjyoti Baruah, et. al.Rupjyoti Baruah ... Rajesh Kumar Mundotiya
16 Nov 2021
ACM Transactions on Asian and Low-Resource Language Information Processing | VOL. 21

Neural Machine Translation System for English to Indian Language Translation Using MTIL Parallel Corpus
B Premjith ... K.P Soman
Journal of Intelligent Systems | VOL. 28
B Premjith, et. al.B Premjith ... K.P Soman
20 Mar 2019
Journal of Intelligent Systems | VOL. 28

Human Versus Automatic Evaluation of NMT for Low-Resource Indian Language
Goutam Datta ... Kusum Gupta
-
Goutam Datta, et. al.Goutam Datta ... Kusum Gupta
01 Jan 2023
01 Jan 2023

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Improving Multilingual Neural Machine Translation System for Indic Languages

Abstract

Talk to us

Similar Papers

More From: ACM Transactions on Asian and Low-Resource Language Information Processing