Abstract

Authorship attribution (AA) is the process of attempting to identify the likely authorship of a given document by analyzing previous works of the authors in question. This paper proposes deep neural network-based continuous skip-gram models for authorship attribution in Bengali Literature. We present a data set of 2400 Bengali blog posts from 6 authors of current time and compare the performances of traditional lexical n-gram based models to our proposed approaches. We achieve a best accuracy of more than 92% on the held-out dataset with a deep convolutional neural network with skipgram word embeddings by fastText as the feature, which outperforms the other traditional models examined in this paper on Bengali Language. The results provide a clear indication that extracting features with the use of hidden layers in deep neural networks from continuous word embeddings work better as a feature set for authorship attribution systems on Bengali Literature than sparse lexical n-gram based features and shallow classifiers.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call