Abstract

Automatic classification of poetic content is very challenging from the computational linguistic point of view. For library suggestion framework, poetries can be grouped on different measurements, for example, artist, day and age, assumptions, and topic. In this work, content-based Punjabi poetry classifier was built utilising Weka toolset. Four unique classes were manually populated with 2,034 poetries. NAFE, LIPA, RORE, PHSP classes comprises of 505, 399, 529 and 601 number of poems, individually. These poems were passed to different pre-processing sub stages, for example, tokenisation, noise removal, stop word removal, special symbol removal. An aggregate of 31,938 tokens was separated, after passing through pre-processing layer, and weighted using term frequency (TF) and term frequency-inverse document frequency (TF-IDF) weighting plan. Depending upon poetic elements of poetry, two different poetic features (orthographic and phonemic) were experimented to build a classifier using machine learning algorithms. Naive Bayes, support vector machine, hyper pipes, and K-nearest neighbour algorithms experimented with two poetic features. The results revealed that addition of poetic features does not boost the performance of Punjabi poetry classification task. Using poetic features, the best performing algorithm is SVM and highest accuracy (71.98%) is achieved considering orthographic features.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call