Abstract

This work adopts some classification approaches for categorizing Arabic text. The approaches are operated on two datasets as test-beds. A comparative study is done to evaluate the performance of the adopted classifiers. Some feature selection methods are also analyzed, investigated, and evaluated. Selecting the most significant features is important because the huge number of features may cause performance degradation for text classification. A comparative study is done among the adopted feature selection methods for classifying Arabic documents. Moreover, a modification is done on the feature selection approaches by doing amalgamation for the chosen methods. A novel method is also proposed for selecting the most appropriate features. The method is based on the semantic fusion and multiple-words (SF-MW) for constructing the features. A comparison is done among the adopted feature selection methods and the proposed one. The experimental results show that the best performance was for the SVM classifier compared to the KNN and NB classifiers. The combination among the adopted feature selection methods presents better results compared to the individual adopted ones. The proposed feature selection method (SF-MW) is promising as it reduced the features and achieved higher classification accuracy. The accuracy improvement was about 22% for the two chosen Arabic test-beds which contain 1246 and 1500 documents respectively. The proposed method is expected to be also efficient for other Arabic and English datasets.

Highlights

  • AND RELATED WORKText classification can be briefly defined as assigning document or text to predefined categories or classes based on their contents

  • Any document was represented as an instance containing a set of features. i.e., each of the adopted dataset was represented as a matrix containing a set of instances and a set of features that are describing those features

  • A threshold value was chosen and the whole dataset was represented in a way easy to be processed by the classifiers

Read more

Summary

AND RELATED WORK

Text classification can be briefly defined as assigning document or text to predefined categories or classes based on their contents. Several research works were presented regarding text classification, machine learning algorithms, and feature selection methods. Examples of the research efforts are briefly mentioned as follows: [2] presented a comparative study among the performance of some classification algorithms using feature selection with and without stemming. [21] proposed a multivariate filter method for feature selection in text classification. Three classification algorithms were implemented on WEKA toolkit mainly: decision tree, Naïve Bayes, and support-vector machines. The authors in their research work described several text mining tasks and techniques including text pre-processing, classification and clustering.

THE PROCESS OF ARABIC TEXT CLASSIFICATION
THE ADOPTED CLASSIFICATION APPROCHES
APPROCHES OF FEATURE SECTION
IMPELEMENTATION AND PERFORMANCE EVALUATION OF THE ADOPTED CLASSIFIERS
THE PROPOSED ENHANSMENT APPROCHES FOR FEATURE SELECTION USING SVM CLASSIFIER
Findings
OF RESULTS AND CONCLUSION

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.