Abstract

With the increase in amount of Punjabi content available gives rise to a problem to manage this online textual data. So in order to manage these data, it must be classified into classes. Punjabi Poetry Classification is a Text Classification problem. Pre-processing phase plays an important role in classification task. Pre-processing phase is divided into sub-phases: tokenisation, unique word identification and term frequency calculation, special symbol, punctuation marks removal and stop word identification. This paper also discusses the importance of each sub-phase in Punjabi poetry. This paper concentrates on identification of stop words from poetry and other news articles. In this paper, 256 stop words identified from poems as well as news articles are released for public use.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call