Abstract

In this paper, we deal with the problem of authorship attribution (AA) on short Arabic texts. So, we make a survey on a set of several features and classifiers that are employed for the task of AA. This investigation uses characters, character bigrams, character trigrams, character tetragrams, words, word bigrams and rare words. The AA is ensured by 4 different measures, 3 classifiers (Multi-Layer Perceptron (MLP), Support Vector Machines (SVM) and Linear Regression (LR)) and a new proposed fusion called VBF (i.e. Vote Based Fusion). The evaluation is done on short Arabic texts extracted from the AAAT dataset (AA of Ancient Arabic Texts). Although the task of AA is known to be difficult on short texts, the different results have revealed interesting information on the performances of the features and classification techniques on Arabic text data. For instance, character-based features appear to be better than word-based features for short texts. Furthermore, the proposed VBF fusion provided high performances with an accuracy of 90% of good AA, which is higher than the score of the original classifier using only one feature. Globally, the results of this investigation shed light on the efficiency and pertinency of several features and classifiers in AA of short Arabic texts.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call