Abstract

Arabic Text categorization is considered one of the severe problems in classification using machine learning algorithms. Achieving high accuracy in Arabic text categorization depends on the preprocessing techniques used to prepare the data set. Thus, in this paper, an investigation of the impact of the preprocessing methods concerning the performance of three machine learning algorithms, namely, Na¨ive Bayesian, DMNBtext and C4.5 is conducted. Results show that the DMNBtext learning algorithm achieved higher performance compared to other machine learning algorithms in categorizing Arabic text.

Highlights

  • Constructing an automated text categorization system for Arabic articles/documents is a difficult work as a result of the unique nature of the Arabic language

  • We focus on exploring this impact on Arabic corpora to improve the categorization accuracy by investigating different machine learning approaches, mainly Naıve Bayesian, DMNBtext and C4.5 algorithms

  • Achieving high accuracy in Arabic text categorization depends on the preprocessing techniques used to prepare the data set

Read more

Summary

Introduction

Constructing an automated text categorization system for Arabic articles/documents is a difficult work as a result of the unique nature of the Arabic language. Arabic language consists of 28 letters and is written from right to left. It has a distinctive morphology and orthography principles. The number of text information accessible on the Internet has increased rapidly on the last few years since many private and public organizations are publishing their text information such as documents, news, books, etc. This creates a vast amount of text information that makes the manual categorization of text information a very impractical task. The development of automated text categorization/classification system is important work

Objectives
Methods
Results
Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.