Authorship Attribution in Bengali Literature using Convolutional Neural Networks with fastText’s word embedding model

Hemayet Ahmed Chowdhury,Md Azizul Haque Imon,Md Saiful Islam,Syed Md Hasnayeen

doi:10.1109/icasert.2019.8934492

Hemayet Ahmed Chowdhury, Md Azizul Haque Imon + Show 2 more

https://doi.org/10.1109/icasert.2019.8934492

Copy DOI

Export

Save

Cite

Abstract
Full-Text
Similar Papers

Abstract

Listen

Authorship attribution (AA) is the process of attempting to identify the likely authorship of a given document by analyzing previous works of the authors in question. This paper proposes deep neural network-based continuous skip-gram models for authorship attribution in Bengali Literature. We present a data set of 2400 Bengali blog posts from 6 authors of current time and compare the performances of traditional lexical n-gram based models to our proposed approaches. We achieve a best accuracy of more than 92% on the held-out dataset with a deep convolutional neural network with skipgram word embeddings by fastText as the feature, which outperforms the other traditional models examined in this paper on Bengali Language. The results provide a clear indication that extracting features with the use of hidden layers in deep neural networks from continuous word embeddings work better as a feature set for authorship attribution systems on Bengali Literature than sparse lexical n-gram based features and shallow classifiers.

Full Text