A learning approach towards metre-based classification of similar Hindi poems using proposed two-level data transformation

Komal Naaz,Niraj Kumar Singh

doi:10.1093/llc/fqad011

Abstract

Abstract With the advancement in technology and digitalization of resources, computation of humanities problems is no exception to remain untouched. Automatic poetry classification is now a well-defined problem which can be solved using various approaches. Mood-based poetry classification is one of the popular ones. We propose a learning approach towards metre-based classification of Hindi metrical poetry. The state of art model for the metre-based poetry classification uses the rule-based approach whereas the proposed system uses learning models to perform classification. Feature extraction and classification are the two main components of text classification in natural language processing. Text is transformed into machine-readable numbers through the process of feature extraction, which is subsequently submitted to classification models. Poems, in their most natural formulation, are unfit to any learning-based algorithms. However, transforming the data into certain form and selecting a fixed number of features out of it (feature extraction) made the classification possible using machine learning approach which was yet untouched and can act as benchmark for the concerned area of research. The article deals with six popular and similar types of Hindi poems. The dataset is collected and processed to form an early dataset that undergoes two levels of data transformation and feature engineering, resulting in the pre-processed dataset. The pre-processed dataset is then fed as input to selected machine learning models (Bernoulli Naïve Bayes, k-nearest neighbour, random forest, and support vector machine) producing classification result with best accuracy of 99%, that further undergoes a post-processing step based on observed misclassifications.

Full Text