Chapter eleven - Hybrid Arabic classification techniques based on naïve Bayes algorithm for multidisciplinary applications

Mohammed Otair,Somaya Zacout,Laith Abualigah,Mahmoud Omari

doi:10.1016/b978-0-12-820793-2.00004-5

Abstract

Text classification is a major data-mining application that holds a high weight in the modern digitized world; it assists many areas such as renewable energy systems, manufacturing applications, email filtering, digital libraries, and security threats, among many other areas. In Arabic text classification area, the current works remain limited, which creates a huge possibility for new work. The Arabic language is one of the top five most spoken languages in the world, therefore, the availability of Arabic text is far from limited. In this chapter, three algorithms including support vector machine (SVM), artificial neural network (ANN), and J48, are combined with the naïve Bayes algorithm to create new hybrid algorithms, namely, Vote NBSVM, Vote NBANN, and Vote NBJ48. These algorithms are applied on three Arabic text datasets with a total of 32,262 documents, and their performance is measured and compared. The results showed that the voting NBSVM obtained better results than other algorithms. When combined with the naïve Bayes algorithm using the voting method, the accuracy levels dropped by 2.25% for the ANN algorithm, 6.62% for the SVM algorithm, and 2.52% for the J48 algorithm, where the new algorithm NBJ48 showed superior outstanding results that are highly competitive and increased the accuracy of the J48 by 5.72%. This is especially important taking into account that this algorithm has never been applied to Arabic text.

Full Text