Abstract

Text categorization refers to the process of grouping text or documents into classes or categories according to their content. Text categorization process consists of three phases which are: preprocessing, feature extraction and classification. In comparison to the English language, just few studies have been done to categorize and classify the Arabic language. For a variety of applications, such as text classification and clustering, Arabic text representation is a difficult task because Arabic language is noted for its richness, diversity, and complicated morphology. This paper presents a comprehensive analysis and a comparison for researchers in the last five years based on the dataset, year, algorithms and the accuracy they got. Deep Learning (DL) and Machine Learning (ML) models were used to enhance text classification for Arabic language. Remarks for future work were concluded.

Highlights

  • Finding useful knowledge on a given subject in a vast volume of online textual data that is rapidly growing is a difficult challenge

  • The findings showed that Arabic text classification issue is very promising with deep learning classification models

  • convolution neural network (CNN) Arabic news is made up of 5070 documents and is divided into 6 classes: sport, SciTech, entertainment, middle east, business and world [2,3,4]. 4.2 The Preprocessing Some preprocessing is required to deal with text data to select features which are semantically represent the document and remove other features that are not

Read more

Summary

Introduction

Finding useful knowledge on a given subject in a vast volume of online textual data that is rapidly growing is a difficult challenge. El-Alami et al (2016) [4], for Arabic Text Categorization (ATC), they suggested an effective approach based on deep learning, using a deep stacked autoencoder that has word-count vectors as input. They used Restricted Boltzmann Machines (RBM) in the pre-training stage, to make the deep network, they unrolled the model and backpropagation is used during the fine-tuning stage. El-Alami et al (2020) [11] they proposed an Arabic text categorization method based on Bagof-Concepts and deep Autoencoder representations It incorporates explicit semantics relying on Arabic WordNet and exploits Chi-Square measures to select the most informative features.

Evaluation of the model
Conclusions
Future work
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.