Text Classification of Indonesian Translated Hadith Using XGBoost Model and Chi-Square Feature Selection

Adiwijaya Adiwijaya,Dita Julaika Putri,Mahendra Dwifebri

doi:10.47065/bits.v4i4.2944

Adiwijaya Adiwijaya, Dita Julaika Putri + Show 1 more

Open Access

https://doi.org/10.47065/bits.v4i4.2944

Copy DOI

Abstract

Aside from the Holy Qur'an, Hadith is indeed a life guide that every Muslims in this world must follow. The technology for classifying texts and sentences, including categorizing hadiths, is evolving in tandem with the advancement of the times. The model used to perform classification has also been developed and optimized such as the use of the XGBoost algorithm which is more optimized than the previous tree algorithm. This can also make it easier for us as Muslims to study hadiths by categorizing them according to recommendations, prohibitions, and information. This study conducted text classification of Indonesian translations of hadith texts based on recommendations, prohibitions, and information using the XGBoost algorithm, TF-IDF for its feature extraction, and Chi-Square for its feature selection. In this study, experiments were carried out by changing the order of the preprocessing process for the stopword removal and stemming parts, performing the classification process with and without using chi-square as a feature selection, and adding parameter value during the modeling process with XGBoost and the highest final results obtained were 79% for accuracy, 79% for precision, 78% for recall and 78% for F1-score.

Full Text