Hybrid Pipeline for Building Arabic Tunisian Dialect-standard Arabic Neural Machine Translation Model from Scratch

Saméh Kchaou,Lamia Hadrich,Rahma Boujelbane

doi:10.1145/3568674

Abstract

Deep Learning is one of the most promising technologies compared to other methods in the context of machine translation. It has been proven to achieve impressive results on large amounts of parallel data for well-endowed languages. Nevertheless, for low-resource languages such as the Arabic Dialects, Deep Learning models failed due to the lack of available parallel corpora. In this article, we present a method to create a parallel corpus to build an effective NMT model able to translate into MSA, Tunisian Dialect texts present in social networks. For this, we propose a set of data augmentation methods aiming to increase the size of the state-of-the-art parallel corpus. By evaluating the impact of this step, we noticed that it has effectively boosted both the size and the quality of the corpus. Then, using the resulted corpus, we compare the effectiveness of CNN, RNN and transformers models to translate Tunisian Dialect into MSA. Experiments show that a better translation is achieved by the transformer model with a BLEU score of 60 vs., respectively, 33.36 and 53.98 with RNN and CNN models.

Full Text

Published version (

Free)

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Hybrid Pipeline for Building Arabic Tunisian Dialect-standard Arabic Neural Machine Translation Model from Scratch

Abstract

Talk to us

Similar Papers

More From: ACM Transactions on Asian and Low-Resource Language Information Processing

Lead the way for us

Journal: ACM Transactions on Asian and Low-Resource Language Information Processing	Publication Date: Mar 31, 2023
Citations: 2

Similar Papers

Improving Flare Detection via Masked Difference Prediction
Zili Tang ... Jian Cao
-
Zili Tang, et. al.Zili Tang ... Jian Cao
01 Dec 2020
01 Dec 2020

The neural machine translation models for the low-resource Kazakh-English language pair.
Aidana Karibayeva ... Vladislav Karyukin
PeerJ. Computer science | VOL. 9
Aidana Karibayeva, et. al.Aidana Karibayeva ... Vladislav Karyukin
08 Feb 2023
PeerJ. Computer science | VOL. 9

Olive Disease Classification Based on Vision Transformer and CNN Models.
Lassaad Ben Ammar ... Hamoud Alshammari
Computational intelligence and neuroscience | VOL. 2022
Lassaad Ben Ammar, et. al.Lassaad Ben Ammar ... Hamoud Alshammari
31 Jul 2022
Computational intelligence and neuroscience | VOL. 2022

Stance-Based Fake News Identification on Social Media with Hybrid CNN and RNN-LSTM Models
P P N V Kumara ... M D P P Goonathilake
International Journal on Advances in ICT for Emerging Regions (ICTer) | VOL. 16
P P N V Kumara, et. al.P P N V Kumara ... M D P P Goonathilake
18 Dec 2023
International Journal on Advances in ICT for Emerging Regions (ICTer) | VOL. 16

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Hybrid Pipeline for Building Arabic Tunisian Dialect-standard Arabic Neural Machine Translation Model from Scratch

Abstract

Talk to us

Similar Papers

More From: ACM Transactions on Asian and Low-Resource Language Information Processing