Deep Learning and Super-Hybrid Textual Feature Based Multi-category Thematic Classifier for Punjabi Poetry

Jasleen Kaur,Jatinderkumar Saini

doi:10.1007/978-981-19-2719-5_5

Abstract

For automatic classification tasks, computational linguistic examination of a literary text is challenging. Literary content such as poetry can be categorized based on emotion, theme, poet, hidden message, etc. In this work, we have proposed a theme-based deep Punjabi poetry classifier. The dataset comprises over 2000 poetries divided into 8 subcategories: nature, Festival, Linguistic, Patriotic, Romantic, Relation, Philosophy, and Spiritual. Tokenization, stop word removal, stemming, and particular symbol removal were among the pre-processing sub-phases applied to these poetries. We coined the phrase ‘Bag of Poetry Words’ (BOPW) for nearly 32000 of such extracted unigram tokens. Term frequency (TF) weighting scheme was used for weighing extracted tokens. Four different textual features (lexical, lexical with syntactic, lexical with semantic, and a super hybrid) were tested to develop a classifier based on poetry elements. Adaboost (AB), Bagging (BG), Bi-directional Long Short Term Memory (Bi-LSTM), C4.5, Gradient Boosting (GB), Hyperpipes (HP), K-nearest neighbor (KNN), Long Short Term Memory (LSTM), Naïve Bayes (NB), PART, Random Forest (RF), Support Vector Machine (SVM), Voting Intervals Interval (VFI), and ZeroR algorithms have experimented with different textual features. These 14 machine learning algorithms were divided into baseline learners, ensemble learners, and deep learners. The results revealed that the best performing algorithm was SVM from baseline learners, and the highest accuracy (76.14%) was achieved by incorporating a super hybrid textual feature. The best performing algorithm was GB from ensemble learners, with an accuracy of 64.10%. Bi-LSTM reported the highest accuracy of 80.32% using super hybrid features from deep learners.

Full Text