Abstract

Abstract Authorship attribution (AA) is the task of identifying author of an unknown text from the known author set. Authorship Attribution can be viewed is a problem of text classification. AA is based on the classification of documents on author writing style rather than the topic of the text. In this paper experimental evaluations were carried out on Telugu text for Authorship Attribution using various types of features and their combinations. Feature vectors were formed for the training set using lexical, syntactic and structural features and their combinations. Learned model was generated for each these vectors and performance of the learned model is calculated using F1 metric and accuracy. More number of features can slow down the model performance. Features which are not relevant or not more relevant were eliminated from the feature vectors using chi-square metric. Support Vector Machine (SVM) algorithm is used as a classifier to generate the learned model for each dimensional feature vector. This learned model is used to assign the anonymous text to one of the known authors.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call