BERT-based models for classifying multi-dialect Arabic texts

Hassan Fouadi,Hicham Lamtougui,Ali Yahyaouy,Hicham El Moubtahij

doi:10.11591/ijai.v13.i3.pp3437-3446

Abstract

<a name="_Hlk133997549"></a><a name="_Hlk127607325"></a>The area of natural language processing (NLP) is presently a rapidly developing field characterized by innovation and research. Despite this progress, several <a name="_Hlk148089340"></a>dialects of Arabic (DA) are classified as low-resource languages, making it challenging for NLP systems to process DA data. One approach to address this issue is to train NLP models on social media data sets containing DA texts. Therefore, these open-access social media datasets, as outlined in our paper, can serve as a valuable resource for developers and researchers involved in the processing of DA.To create our multilingual corpus, we gathered data from various datasets containing different versions of DA. These datasets will be used to classify texts in terms of sentiment classification, topic classification, and dialect identification. Our study contributes to the automated analysis of the classification of Arabic dialects. We aim to investigate and assess various machine learning and deep learning techniques, with a specific focus on utilizing the BERT model. The results of our experiments on our datasets show that DarijaBERT and DziriBERT trained on a similar DA outperform traditional machine learning methods and previous more general pre-trained models that were trained on multiple dialects or languages.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

BERT-based models for classifying multi-dialect Arabic texts

Abstract

Talk to us

Similar Papers

More From: IAES International Journal of Artificial Intelligence (IJ-AI)

Lead the way for us

Journal: IAES International Journal of Artificial Intelligence (IJ-AI)	Publication Date: Sep 1, 2024
License type: CC BY-SA 4.0

Similar Papers

Sentiment classification of delta robot trajectory control using word embedding and convolutional neural network
Zendi Iklima ... Trie Maya Kadarina
Indonesian Journal of Electrical Engineering and Computer Science | VOL. 26
Zendi Iklima, et. al.Zendi Iklima ... Trie Maya Kadarina
01 Apr 2022
Indonesian Journal of Electrical Engineering and Computer Science | VOL. 26

Role of Social Media in Leveraging Urban Community Empowerment
Gita Aprinta Ester Betseba ... Endah Triastuti
Jurnal The Messenger | VOL. 14
Gita Aprinta Ester Betseba, et. al.Gita Aprinta Ester Betseba ... Endah Triastuti
11 Jan 2024
Jurnal The Messenger | VOL. 14

Toward accurate Amazigh part-of-speech tagging
Rkia Bani ... Zouhair Guennoun
IAES International Journal of Artificial Intelligence (IJ-AI) | VOL. 13
Rkia Bani, et. al.Rkia Bani ... Zouhair Guennoun
01 Mar 2024
IAES International Journal of Artificial Intelligence (IJ-AI) | VOL. 13

Guest Editors Introduction: Machine Learning in Speech and Language Technologies
Pascale Fung ... Dan Roth
Machine Learning | VOL. 60
Pascale Fung, et. al.Pascale Fung ... Dan Roth
01 Sep 2005
Machine Learning | VOL. 60

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

BERT-based models for classifying multi-dialect Arabic texts

Abstract

Talk to us

Similar Papers

More From: IAES International Journal of Artificial Intelligence (IJ-AI)