Abstract

This paper presents a sentiment analysis approach based on Markov chains for predicting the sentiment of Urdu tweets. Sentiment analysis has been a focus of natural language processing (NLP) research community from the past few decades. The reason for this growing interest is twofold. First, the complexity involved in identifying sentiment from the unstructured text makes it a challenging problem for the research community. Second, sentiment analysis has a wide variety of applications ranging from industry to academia has made it a popular area in the research field of NLP. However, very little work has been done on sentiment analysis for the low resource languages which include Urdu, Bengali, Hindi, and other Asian languages. This work focuses on developing a 3-class (positive, negative, and neutral) sentiment classification model for the Urdu language. The experiments were conducted on the labeled corpus of Urdu tweets extracted from the Twitter network. One of the main contributions of this research includes the development of a large labeled corpus of Urdu Tweets for sentiment analysis. To the best of our knowledge, there is no such corpus available publicly in the Urdu Language. The labeled dataset is available on GitHub ( https://github.com/zarmeen92/urdutweets ). Furthermore, the results showed that the proposed approach outperforms the lexicon-based and traditional machine learning-based approaches of sentiment analysis.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call