Abstract

Many automatic translation works have been addressed between major European language pairs, by taking advantage of large scale parallel corpora, but very few research works are conducted on the Amharic-Arabic language pair due to its parallel data scarcity. However, there is no benchmark parallel Amharic-Arabic text corpora available for Machine Translation task. Therefore, a small parallel Quranic text corpus is constructed by modifying the existing monolingual Arabic text and its equivalent translation of Amharic language text corpora available on Tanzile. Experiments are carried out on Two Long ShortTerm Memory (LSTM) and Gated Recurrent Units (GRU) based Neural Machine Translation (NMT) using Attention-based Encoder-Decoder architecture which is adapted from the open-source OpenNMT system. LSTM and GRU based NMT models and Google Translation system are compared and found that LSTM based OpenNMT outperforms GRU based OpenNMT and Google Translation system, with a BLEU score of 12%, 11%, and 6% respectively.

Highlights

  • Construction of a parallel corpus is very challenging and needs a high cost of human expertise

  • Parallel corpus consists of parallel text that can promptly locate all the occurrences of one language expression to another language expression and is one of the significant resources that could be utilized for Machine Translation (MT) tasks. [2]

  • We adopted openNMT Attention-based Encoder-Decoder architecture to construct Amharic Arabic Neural Machine Translation (NMT) model, because attention mechanisms are being progressively used to enhance the performance of NMT by selectively focusing on sub-parts of the sentence during translation [25]

Read more

Summary

INTRODUCTION

Construction of a parallel corpus is very challenging and needs a high cost of human expertise. Machine Translation (MT), the task of translating texts from one natural language to another natural language automatically, is an important application of Computational Linguistics (CL) and Natural Language Processing (NLP) It can produce high-quality translation results based on a massive amount of aligned parallel text corpora in both the source and target languages [1]. Deep learning NMT approach is a recent approach of MT that produces high-quality translation results based on a massive amount of aligned parallel text corpora in both the source and target languages. NMT is one of the deep learning end-to-end learning approaches to MT that uses a large ANN to predict the likelihood of a sequence of words, typically modeling entire sentences in a single integrated model The advantage of this approach is that a single system can be trained directly on the source and target text no longer requiring the pipeline of specialized systems used in statistical MT.

TRANSLATION CHALLENGES OF AMHARIC AND ARABIC LANGUAGES
RELATED WORKS
CONSTRUCTION OF AMHARIC ARABIC PARALLEL TEXT CORPUS
EXPERIMENTS AND RESULTS
CONCLUSION
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.