Authorship Classification in a Resource Constraint Language Using Convolutional Neural Networks

Md. Rajib Hossain,Iqbal H. Sarker,M. Ali Akber Dewan,Mohammed Moshiul Hoque,Nazmul Siddique,Md. Nazmul Islam

doi:10.1109/access.2021.3095967

Abstract

Authorship classification is a method of automatically determining the appropriate author of an unknown linguistic text. Although research on authorship classification has significantly progressed in high-resource languages, it is at a primitive stage in the realm of resource-constraint languages like Bengali. This paper presents an authorship classification approach made of Convolution Neural Networks (CNN) comprising four modules: embedding model generation, feature representation, classifier training and classifier testing. For this purpose, this work develops a new embedding corpus (named WEC) and a Bengali authorship classification corpus (called BACC-18), which are more robust in terms of authors’ classes and unique words. Using three text embedding techniques (Word2Vec, GloVe and FastText) and combinations of different hyperparameters, 90 embedding models are created in this study. All the embedding models are assessed by intrinsic evaluators and those selected are the 9 best performing models out of 90 for the authorship classification. In total 36 classification models, including four classification models (CNN, LSTM, SVM, SGD) and three embedding techniques with 100, 200 and 250 embedding dimensions, are trained with optimized hyperparameters and tested on three benchmark datasets (BACC-18, BAAD16 and LD). Among the models, the optimized CNN with GloVe model achieved the highest classification accuracies of 93.45%, 95.02%, and 98.67% for the datasets BACC-18, BAAD16, and LD, respectively.

Highlights

Authorship classification is a long-established research topic in Natural Language Processing (NLP) that deals with the difficulty of identifying the author against a particular text
Based on the intrinsic evaluation performance, a total of 9 top-performing embedding models are selected for the authorship classification task
Three models are chosen from Global Vectors for Word Representation (GloVe), three from FastText and three from Word2Vec embeddings based on the highest Pearson and Spearman correlation scores

Summary

INTRODUCTION

Authorship classification is a long-established research topic in Natural Language Processing (NLP) that deals with the difficulty of identifying the author against a particular text. Other authors may prefer to apply particular clauses, specific tense, distinguished sentence structure or open and close sentences with an appropriate grammatical constituent These features can be used in identifying the authorship of a particular writing. Authorship classification is a well-established research topic for high resource languages (e.g., English and other European languages) due to the availability of authorship corpus, feature extractors and classification techniques. It is a challenging task for a low-resource language like Bengali due to the shortage of linguistic resources and techniques [1].

RELATED WORK

18 Train and Test Sets Partition

MODEL ARCHITECTURE

HYPERPARAMETERS IDENTIFICATION AND OPTIMIZATION

EXPERIMENTS

EVALUATION MEASURES

EMBEDDING MODELS EVALUATION

VIII. CONCLUSION

Full Text

Published version (

Free)

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: IEEE access : practical innovations, open solutions	Publication Date: Jan 1, 2021
Citations: 14	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

Authorship Classification in a Resource Constraint Language Using Convolutional Neural Networks

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: IEEE access : practical innovations, open solutions

Lead the way for us

Similar Papers

Toward Embedding Hyperparameters Optimization: Analyzing Their Impacts on Deep Leaning-Based Text Classification
Md Rajib Hossain ... Mohammed Moshiul Hoque
-
Md Rajib Hossain, et. al.Md Rajib Hossain ... Mohammed Moshiul Hoque
01 Jan 2023
01 Jan 2023

An Intelligent Metaheuristic Optimization with Deep Convolutional Recurrent Neural Network Enabled Sarcasm Detection and Classification Model
K Kavitha ... Suneetha Chittieni
International Journal of Advanced Computer Science and Applications | VOL. 13
K Kavitha, et. al.K Kavitha ... Suneetha Chittieni
01 Jan 2021
International Journal of Advanced Computer Science and Applications | VOL. 13

An Improved Model for Medical Forum Question Classification Based on CNN and BiLSTM
Emmanuel Mutabazi ... Weidong Cao
Applied sciences | VOL. 13
Emmanuel Mutabazi, et. al.Emmanuel Mutabazi ... Weidong Cao
26 Jul 2023
Applied sciences | VOL. 13

Location Property of Convolutional Neural Networks for Image Classification.
Cong Liang ... Haixia Zhang
IEEE transactions on neural networks | VOL. 32
Cong Liang, et. al.Cong Liang ... Haixia Zhang
25 Aug 2020
IEEE transactions on neural networks | VOL. 32

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Authorship Classification in a Resource Constraint Language Using Convolutional Neural Networks

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: IEEE access : practical innovations, open solutions