Machine Learning for Arabic Text Classification: A Comparative Study

Djelloul Bouchiha,Abdelghani Bouziane,Noureddine Doumi

doi:10.56532/mjsat.v2i4.83

Djelloul Bouchiha, Abdelghani Bouziane + Show 1 more

Open Access

https://doi.org/10.56532/mjsat.v2i4.83

Copy DOI

Abstract

The ultimate aim of Machine Learning (ML) is to make machine acts like a human. In particular, ML algorithms are widely used to classify texts. Text classification is the process of classifying texts into a predefined set of categories based on the texts’ content. It contributes to improving information retrieval on the Web. In this paper, we focus on the "Arabic" text classification since there is a large community in the world that uses this language. The Arabic text classification process consists of three main steps: preprocessing, feature extraction and ML algorithm. This paper presents a comparative empirical study to see which combination (feature extraction - ML algorithm) acts well when dealing with Arabic documents. So, we implemented one hundred sixty classifiers by combining 5 feature extraction techniques and 32 machine learning algorithms. Then, we made these classifiers open access for the benefit of the AI and NLP communities. Experiments were carried out using a huge open dataset. The comparison study reveals that TFIDF-Perceptron is the best performing combination of a classifier.

Full Text