Abstract

Authorship attribution (AA) is the task of identifying authors of anonymous texts. It is represented as multi-class text classification task. It is concerned with writing style rather than topic matter. The scalability issue in traditional AA studies concerns with the effect of data size, the amount of data per candidate author. Most stylometry researches tend to focus on long texts per author, but it is not probed in much depth in short texts. This paper investigates the task of AA on Telugu texts written by 12 different authors. Several experiments were conducted on these texts by extracting various lexical and character features of the writing style of each author, using word n-grams and character n-grams as a text representation. The support vector machine (SVM) classifier is employed in order to classify the texts to their authors. AA performance in terms of F 1 measure and accuracy deteriorates as the number of candidate author’s increases and size of training data decreases.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call