Authorship Identification Through Stylometry Analysis Using Text Processing and Machine Learning Algorithms

Chandrasekhar Uddagiri,M Shanmuga Sundari

doi:10.1007/978-981-19-8563-8_55

Abstract

The project aims to detect the identity of an anonymous author of a defamatory blog post or comment. A dataset samples containing list of authors is acquired, and then predict the anonymous author by using a custom machine learning model. The main task in this proposal is to build an authorship analysis model that will match a sample to the defamatory BlogSpot and reveal the anonymous author. Text preprocessing methods along with a combination of machine learning algorithms such as SDG classifier are employed. Stylometry analysis gives the clarity about the text information like text length, vocabulary and style of text. By this we can use this technique for authorization purpose. The project consists of building a model that can learn authorship style and then scale the model to handle hundreds of such cases. Stylometry analysis plays a major role in this project. An accuracy of 79% is obtained with 40 classes, which was found improving with lesser number of classes.

Full Text