Prediction of authorship using various classification algorithms

Chandrika Prasad,Swapnil Jain,Jagadish S Kallimani

doi:10.1109/icacci.2017.8126083

Abstract

Authors can be differentiated by their styles of writing. In this paper, we propose features which attempt to classify authors based on their writing styles. The features can be usage of parts of speech, punctuation marks, word lengths, sentence lengths, number of unique words used, etc. This concept is used in many fields like email classification, fraud detection, etc. We propose a module to extract various stylometric features of text documents from five Victorian authors. These features are in terms of numerical vectors which are used to train decision trees, neural networks, k-nn and Naive Bayes. The chosen text and authors are of the same period and same genre. Using the proposed algorithm, high accuracy rate can be achieved. While training and testing a classifier, features play an important role. The number of features can have both positive and negative impacts on accuracy rate. Therefore, a set of features are required which can provide high accuracy rate. This set can vary for different classification algorithms.

Full Text