Abstract

Given the importance of the Prophet's Hadith for Muslims all over the world, where it is the second source of Islam after the Qur'an and the fundamental resource of legislation in the Islam community. This study is focused on the Classification of hadith automatically into different categories according to its content, based on Hadith text. The objective of this study is to build a classifier model can classify and differentiate hadith categories, to predict its topic like prayer, fasting, and zakat; using data mining and machine learning techniques. In this study, many supervised learning algorithms plus combination methods such as the stacking algorithm was used to improve classification accuracy. The best three classifiers were evaluated mainly: the Decision Tree (DT), Random Forest (RF), and Naive Bayes (NB), which achieved higher accuracy reached up to 0.965%, 0.956, and 0.951% respectively. Also, Binary (Boolean algebra) and TF-IDF methods as term weighting was applied to determine the frequency of each word in the hadith text, and identify the most significant features in training dataset using Information Gain (IG), and Chi-square (CHI). The experimental results showed that re-train these classifiers after applying IG and CHI as features selection; gave better accuracy compared to the previous results. Additional to, the best classifier gave high accuracy was DT, it has achieved higher accuracy in most test cases whether in the Boolean algebra or TF-IDF because it can deal with missing values and identifying the most essential features from the training dataset, known as features engineering.

Highlights

  • The data mining is becoming an important technique, especially recently because increasing the numbers of electronic data available on the World Wide Web

  • EXPERIMENTAL RESULTS Many classifiers have been used in this study, to identify the best classifier gives high accuracy to classify the hadith into different categories according to its content, for ease identify the hadith topic, according to Sahih al-Bukhari book based on supervised learning algorithms

  • These classifiers have been evaluated in two cases before and after implementing the (CHI, Information Gain (IG)) as feature selection methods, each case of them we applied two methods as term weighting (Boolean algebra, term frequencyinverse document frequency (TF-IDF))

Read more

Summary

Introduction

The data mining is becoming an important technique, especially recently because increasing the numbers of electronic data available on the World Wide Web. The classification technique and supervised learning algorithms plus combination methods have been used to classify the hadith into different categories, according to its topic. In this study the combination methods are used such as the Stacking algorithm to improve the classification accuracy using 10-fold crossvalidation method, it includes two levels of classification, The associate editor coordinating the review of this manuscript and approving it for publication was Amjad Mehmood. In the first level is used some of the base classifiers and always is more than classifier, but the second level is one classifier, is known as the Meta classifier. This algorithm takes more time than individual classifiers, but this method is improving the classification accuracy

Objectives
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call