Abstract
Authorship Identification pertains to establishing the author of a particular document, currently unknown, based on the documents previously available. The field of authorship identification has been explored so far primarily in the English language, using several supervised and unsupervised machine learning models along with usage of NLP techniques, but work on regional languages is highly limited. This may be due to the lack of collection of proper datasets and preprocessing techniques attributed to the rich morphological and stylistic features in these languages. In this paper we apply some supervised machine learning models, namely SVM and Naïve Bayes to Hindi literature to perform authorship analysis by picking four Hindi authors. We compare and analyze the accuracy which is so obtained using different models and bag of words approach.
Published Version
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have