A Domain Specific Parallel Corpus and Enhanced English-Assamese Neural Machine Translation

Sahinur Rahman-Laskar,Partha Pakray,Riyanka Manna,Sivaji Bandyopadhyay

doi:10.13053/cys-26-4-4423

Abstract

Machine translation deals with automatic translation from one natural language to another. Neural machine translation is a widely accepted technique of the corpus-based machine translation approach. However, an adequate amount of training data is required, and there is a need for the domain-wise parallel corpus to improve translational performance that shows translational coverages in various domains. In this work, a domain-specific parallel corpus is prepared that includes different domain coverages, namely, Agriculture, Government Office, Judiciary, Social Media, Tourism, COVID-19, Sports, and Literature domains for low-resource English-Assamese pair translation. Moreover, we have tackled data scarcity and word-order divergence problems via data augmentation and prior alignment concept. Also, we have contributed Assamese pre-trained LM, Assamese word-embeddings by utilizing Assamese monolingual data, and a bilingual dictionary-based post-processing step to enhance transformer-based neural machine translation. We have achieved state-of-the-art results for both forward (English-to-Assamese) and backward (Assamese-to-English) directions of translation.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

A Domain Specific Parallel Corpus and Enhanced English-Assamese Neural Machine Translation

Abstract

Talk to us

Similar Papers

More From: Computación y Sistemas

Lead the way for us

Journal: Computación y Sistemas	Publication Date: Dec 25, 2022
Citations: 1

Similar Papers

Multilingual Neural Translation

-

14 Feb 2020
14 Feb 2020

A Study of Machine Translation Models for Kannada-Tulu
Asha Hegde ... Bharathi Raja Chakravarthi
-
Asha Hegde, et. al.Asha Hegde ... Bharathi Raja Chakravarthi
01 Jan 2023
01 Jan 2023

Adaptation in Statistical Machine Translation for Low-resource Domains in English-Vietnamese Language
Nghia-Luan Pham ... Van-Vinh Nguyen
VNU Journal of Science: Computer Science and Communication Engineering | VOL. 36
Nghia-Luan Pham, et. al.Nghia-Luan Pham ... Van-Vinh Nguyen
30 May 2020
VNU Journal of Science: Computer Science and Communication Engineering | VOL. 36

Effective preprocessing based neural machine translation for English to Telugu cross-language information retrieval
B N V Narasimha Raju ... M S V S Bhadri Raju
IAES International Journal of Artificial Intelligence (IJ-AI) | VOL. 10
B N V Narasimha Raju, et. al.B N V Narasimha Raju ... M S V S Bhadri Raju
01 Jun 2021
IAES International Journal of Artificial Intelligence (IJ-AI) | VOL. 10

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

A Domain Specific Parallel Corpus and Enhanced English-Assamese Neural Machine Translation

Abstract

Talk to us

Similar Papers

More From: Computación y Sistemas