Automatic authorship classification of two ancient books: Quran and Hadith

Halim Sayoud

doi:10.1109/aiccsa.2014.7073263

Abstract

Nowadays the need of a scientific and rigorous tool of automatic authorship classification has become pretty important, especially for ancient documents authentication such as religious or historical books. Hence, in this paper, we conduct some experiments of authorship classification on the Quran and Hadith in order to see if they could have the same author or not (ie. Was the Quran written by the Prophet or only sent down to him, as claimed?). This task, which is commonly called authorship discrimination, represents an important authorship classification application. It consists in checking whether two texts are written by the same author or not by using some AI (Artificial Intelligence) and TM (Text mining) techniques. In our case, two main investigations are conducted and presented: in the first one, the two books are analyzed in a global form; in the second investigation, the two books are segmented into 25 different text segments: 14 segments are extracted from the Quran and 11 ones are extracted from the Hadith. The different segments have more or less the same size, with approximately 2080 tokens per text segment. Several classifiers are employed: SMO-based Support Vector Machines (SVM), Multi Layer Perceptron (MLP) and Linear Regression (LR). This research work has allowed getting extremely interesting information on the ancient books origins.

Full Text