N-gram based Machine Translation for English-Assamese: Two Languages with High Syntactical Dissimilarity

Zakir Hussain,Research Scholar, Department Of Cse, Nit Silchar, Assam, India ,Assistant Professor, Department Of Cse, Nit Silchar, Assam, India ,Faculty, Department Of It, Gauhati University, Assam, India ,Malaya Dutta Borah,Abdul Hannan

doi:10.35940/ijeat.b2320.129219

Zakir Hussain, Research Scholar, Department Of Cse, Nit Silchar, Assam, India + Show 4 more

Open Access

https://doi.org/10.35940/ijeat.b2320.129219

Copy DOI

Abstract

To bridge the language constraint of the people residing in northeastern region of India, machine translation system is a necessity. Large number of people in this region cannot access many services due to the language incomprehensibility. Among several languages spoken, Assamese is one of the major languages used in northeast India. Machine translation for Assamese language is limited compared to other languages. As a result, large number of people using Assamese language cannot avail lots of benefits associated with it. This paper has focused on the development of the English to Assamese translation system using n-gram model. The n-gram model works very well with the language pair having high dissimilarity in syntax compared to other models. The value of n has a very big role in the quality and efficiency of the system. Bilingual Evaluation Understudy (BLEU) score differs significantly with the change of the n-gram. This model uses tuples to reduce the consumption of excess memory and to accelerate the translation process. Parallel corpus has been used for training the n-gram based decoder called MARIE. The number of translation units extracted using n-gram model is much less than the translation units extracted using phrase based model. This has a high impact on system efficiency.

Full Text