Abstract

Text classification is an important topic. The number of electronic documents available on line is massive. Text classification aims to classify documents into a set of predefined categories.  Number of researches conducted on English dataset is great in comparison with number of researches done using Arabic dataset. This research could be considered as reference for most researchers who deal with Arabic dataset. This research used the most well-known algorithms used in text classification with Arabic dataset. Besides that, dataset used in this research is large enough in comparison with most dataset for Arabic language used in other researches. In addition, this research used different selections and weighting methods for documents. I expect that all researchers who would write researches using Arabic dataset will find this work helpful. Algorithms used in this research are naïve Bayesian, support vector machines, artificial neural networks, k- nearest neighbors, C4.5 decision tree and rocchio classifier.

Highlights

  • No doubt that the massive number of available electronic documents make text classification (TC) one of the most critical topics

  • One of the main problems of text classification for both English and Arabic language in general is lacking the availability of general dataset which can be used as benchmark

  • Readers can find a lot of researches talk about text classification using English dataset

Read more

Summary

Introduction

No doubt that the massive number of available electronic documents make text classification (TC) one of the most critical topics. (Adel Hamdan,2011; Raed Abu Zitar,2011; Adel Hamdan,2013) Text classification is not an easy process since sometimes there are a great number of available information in document. Besides that, this information may have a high diversity. A huge number of researches can be found in English dataset text classification. (L.Borrajo,2015; Adel Hamdan,2016; Adel Hamdan, 2018) But the number of researches and experiments done using Arabic dataset still not enough. In this research the author applies the most well-known text classification methods and applies his experiments using Arabic dataset.

Naïve Bayesian
Support Vector Machine
Artificial Neural Networks
K-Nearest Neighbor
Rocchio Classifier
Arabic Language
10. Related Studies
11. Dataset
12. Experiments and Analysis
13. Conclusion and Future Work
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call