Abstract

Urdu literature has a rich tradition of poetry, with many forms, one of which is Ghazal. Urdu poetry structures are mainly of Arabic origin. It has complex and different sentence structure compared to our daily language which makes it hard to classify. Our research is focused on the identification of poets if given with ghazals as input. Previously, no one has done this type of work. Two main factors which help categorize and classify a given text are the contents and writing style. Urdu poets like Mirza Ghalib, Mir Taqi Mir, Iqbal and many others have a different writing style and the topic of interest. Our model caters these two factors, classify ghazals using different classification models such as SVM (Support Vector Machines), Decision Tree, Random forest, Naïve Bayes and KNN (K-Nearest Neighbors). Furthermore, we have also applied feature selection techniques like chi square model and L1 based feature selection. For experimentation, we have prepared a dataset of about 4000 Ghazals. We have also compared the accuracy of different classifiers and concluded the best results for the collected dataset of Ghazals.

Highlights

  • AND RELATED WORKIn particular Urdu Poetry has many Principals such as Ghazal, Nazm, Hamd, Manaqbat etc but in this study we3

  • This paper presents the evaluation experiments on the performance of various machine learning algorithms as the likes of SVM [4,5], KNN [6], Random forest [7], Decision Tree [7] and Naïve Bayes [8,9] for Urdu poet identification

  • We have focused on poet identification in Ghazals only

Read more

Summary

Introduction

AND RELATED WORKIn particular Urdu Poetry has many Principals such as Ghazal, Nazm, Hamd, Manaqbat etc but in this study we3. In particular Urdu Poetry has many Principals such as Ghazal, Nazm, Hamd, Manaqbat etc but in this study we. Since no one has done a similar work in Urdu language, we need to prepare such data for the first time. We have scraped Urdu ghazal data from various sites, tagged ghazals with their respective poets and stored in a database in the first section. We have applied few preprocessing techniques to convert it into an acceptable format. Afterward in Poet Identification section, we have applied some feature selection algorithms to choose only the important features. We have applied various machine learning classifiers to identify the poets

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call