Construction of Amharic-arabic Parallel Text Corpus for Neural Machine Translation

Ibrahim Gashaw,Shashirekha Shashirekha

doi:10.5121/ijaia.2020.11107

Abstract

Many automatic translation works have been addressed between major European language pairs, by taking advantage of large scale parallel corpora, but very few research works are conducted on the Amharic-Arabic language pair due to its parallel data scarcity. However, there is no benchmark parallel Amharic-Arabic text corpora available for Machine Translation task. Therefore, a small parallel Quranic text corpus is constructed by modifying the existing monolingual Arabic text and its equivalent translation of Amharic language text corpora available on Tanzile. Experiments are carried out on Two Long ShortTerm Memory (LSTM) and Gated Recurrent Units (GRU) based Neural Machine Translation (NMT) using Attention-based Encoder-Decoder architecture which is adapted from the open-source OpenNMT system. LSTM and GRU based NMT models and Google Translation system are compared and found that LSTM based OpenNMT outperforms GRU based OpenNMT and Google Translation system, with a BLEU score of 12%, 11%, and 6% respectively.

Highlights

Construction of a parallel corpus is very challenging and needs a high cost of human expertise
Parallel corpus consists of parallel text that can promptly locate all the occurrences of one language expression to another language expression and is one of the significant resources that could be utilized for Machine Translation (MT) tasks. [2]
We adopted openNMT Attention-based Encoder-Decoder architecture to construct Amharic Arabic Neural Machine Translation (NMT) model, because attention mechanisms are being progressively used to enhance the performance of NMT by selectively focusing on sub-parts of the sentence during translation [25]

Summary

INTRODUCTION

Construction of a parallel corpus is very challenging and needs a high cost of human expertise. Machine Translation (MT), the task of translating texts from one natural language to another natural language automatically, is an important application of Computational Linguistics (CL) and Natural Language Processing (NLP) It can produce high-quality translation results based on a massive amount of aligned parallel text corpora in both the source and target languages [1]. Deep learning NMT approach is a recent approach of MT that produces high-quality translation results based on a massive amount of aligned parallel text corpora in both the source and target languages. NMT is one of the deep learning end-to-end learning approaches to MT that uses a large ANN to predict the likelihood of a sequence of words, typically modeling entire sentences in a single integrated model The advantage of this approach is that a single system can be trained directly on the source and target text no longer requiring the pipeline of specialized systems used in statistical MT.

TRANSLATION CHALLENGES OF AMHARIC AND ARABIC LANGUAGES

RELATED WORKS

CONSTRUCTION OF AMHARIC ARABIC PARALLEL TEXT CORPUS

EXPERIMENTS AND RESULTS

CONCLUSION

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Construction of Amharic-arabic Parallel Text Corpus for Neural Machine Translation

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: International Journal of Artificial Intelligence & Applications

Lead the way for us

Journal: International Journal of Artificial Intelligence & Applications	Publication Date: Jan 31, 2020
License type: cc-by

Similar Papers

Amharic-arabic Neural Machine Translation
Ibrahim Gashaw ... H L Shashirekha
-
Ibrahim Gashaw, et. al.Ibrahim Gashaw ... H L Shashirekha
14 Dec 2019
14 Dec 2019

Neural machine translation systems for English to Khasi: A case study of an Austroasiatic language
Aiusha Vellintihun Hujon ... Khwairakpam Amitab
Expert Systems With Applications | VOL. 238
Aiusha Vellintihun Hujon, et. al.Aiusha Vellintihun Hujon ... Khwairakpam Amitab
28 Sep 2023
Expert Systems With Applications | VOL. 238

Hybrid Machine Translation with Multi-Source Encoder-Decoder Long Short-Term Memory in English-Malay Translation
Yin-Lai Yeong ... Siti Khaotijah Mohammad
International Journal on Advanced Science, Engineering and Information Technology | VOL. 8
Yin-Lai Yeong, et. al.Yin-Lai Yeong ... Siti Khaotijah Mohammad
26 Sep 2018
International Journal on Advanced Science, Engineering and Information Technology | VOL. 8

Neural Machine Translation Models with Attention-Based Dropout Layer
Huma Israr ... Muneer Ahmad
Computers, Materials & Continua | VOL. 75
Huma Israr, et. al.Huma Israr ... Muneer Ahmad
01 Jan 2023
Computers, Materials & Continua | VOL. 75

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Construction of Amharic-arabic Parallel Text Corpus for Neural Machine Translation

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: International Journal of Artificial Intelligence &amp; Applications

More From: International Journal of Artificial Intelligence & Applications