Abstract

Authors can be differentiated by their styles of writing. In this paper, we propose features which attempt to classify authors based on their writing styles. The features can be usage of parts of speech, punctuation marks, word lengths, sentence lengths, number of unique words used, etc. This concept is used in many fields like email classification, fraud detection, etc. We propose a module to extract various stylometric features of text documents from five Victorian authors. These features are in terms of numerical vectors which are used to train decision trees, neural networks, k-nn and Naive Bayes. The chosen text and authors are of the same period and same genre. Using the proposed algorithm, high accuracy rate can be achieved. While training and testing a classifier, features play an important role. The number of features can have both positive and negative impacts on accuracy rate. Therefore, a set of features are required which can provide high accuracy rate. This set can vary for different classification algorithms.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.