Investigating the relevance of Arabic text classification datasets based on supervised learning

Ahmad Hussein Ababneh

doi:10.1016/j.jnlest.2022.100160

Ahmad Hussein Ababneh

Open Access

https://doi.org/10.1016/j.jnlest.2022.100160

Copy DOI

Abstract

Training and testing different models in the field of text classification mainly depend on the pre-classified text document datasets. Recently, seven datasets have emerged for Arabic text classification, including Single-Label Arabic News Articles Dataset (SANAD), Khaleej, Arabiya, Akhbarona, KALIMAT, Waten2004, and Khaleej2004. This study investigates which of these datasets can provide significant training and fair evaluation for text classification. In this investigation, well-known and accurate learning models are used, including naive Bayes, random forest, K-nearest neighbor, support vector machines, and logistic regression models. We present relevance and time measures of training the models with these datasets to enable Arabic language researchers to select the appropriate dataset to use based on a solid basis of comparison. The performances of the five learning models across the seven datasets are measured and compared with the performance of the same models trained on a well-known English language dataset. The analysis of the relevance and time scores shows that training the support vector machine model on Khaleej and Arabiya obtained the most significant results in the shortest amount of time, with the accuracy of 82%.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Journal of Electronic Science and Technology	Publication Date: Jun 1, 2022
Citations: 9	License type: cc-by-nc-nd

R Discovery Prime

R Discovery Prime

Investigating the relevance of Arabic text classification datasets based on supervised learning

Abstract

Talk to us

Similar Papers

More From: Journal of Electronic Science and Technology

Lead the way for us

Similar Papers

Arabic Text Classification: A Review
Adel Hamdan Mohammad
Modern Applied Science | VOL. 13
Adel Hamdan MohammadAdel Hamdan Mohammad
30 Apr 2019
Modern Applied Science | VOL. 13

Improving Arabic Text Classification Using P-Stemmer
Tarek Kanan ... Shadi Alzubi
Recent Advances in Computer Science and Communications | VOL. 15
Tarek Kanan, et. al.Tarek Kanan ... Shadi Alzubi
01 Mar 2022
Recent Advances in Computer Science and Communications | VOL. 15

Arabic text classification: New study
Rabii Ayed ... Mohamed Labidi
-
Rabii Ayed, et. al.Rabii Ayed ... Mohamed Labidi
01 May 2017
01 May 2017

Arabic text classification using principal component analysis with different supervised classifiers
Marwa Louail ... Aboubekeur Hamdi-Cherif
-
Marwa Louail, et. al.Marwa Louail ... Aboubekeur Hamdi-Cherif
09 Dec 2021
09 Dec 2021

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Investigating the relevance of Arabic text classification datasets based on supervised learning

Abstract

Talk to us

Similar Papers

More From: Journal of Electronic Science and Technology