Authorship Identification Using Supervised Learning and n-Grams for Hindi Language

Jagadish S. Kallimani,Zaifa Khan,C. P. Chandrika,Aniket Singh

doi:10.1166/jctn.2020.9058

Jagadish S. Kallimani, Zaifa Khan + Show 2 more

https://doi.org/10.1166/jctn.2020.9058

Copy DOI

Export

Save

Cite

Abstract
Full-Text
Similar Papers

Abstract

Listen

Authorship Identification pertains to establishing the author of a particular document, currently unknown, based on the documents previously available. The field of authorship identification has been explored so far primarily in the English language, using several supervised and unsupervised machine learning models along with usage of NLP techniques, but work on regional languages is highly limited. This may be due to the lack of collection of proper datasets and preprocessing techniques attributed to the rich morphological and stylistic features in these languages. In this paper we apply some supervised machine learning models, namely SVM and Naïve Bayes to Hindi literature to perform authorship analysis by picking four Hindi authors. We compare and analyze the accuracy which is so obtained using different models and bag of words approach.

Full Text