Sentiment Analysis on Urdu Tweets Using Markov Chains

Zarmeen Nasim,Sayeed Ghani

doi:10.1007/s42979-020-00279-9

Abstract

This paper presents a sentiment analysis approach based on Markov chains for predicting the sentiment of Urdu tweets. Sentiment analysis has been a focus of natural language processing (NLP) research community from the past few decades. The reason for this growing interest is twofold. First, the complexity involved in identifying sentiment from the unstructured text makes it a challenging problem for the research community. Second, sentiment analysis has a wide variety of applications ranging from industry to academia has made it a popular area in the research field of NLP. However, very little work has been done on sentiment analysis for the low resource languages which include Urdu, Bengali, Hindi, and other Asian languages. This work focuses on developing a 3-class (positive, negative, and neutral) sentiment classification model for the Urdu language. The experiments were conducted on the labeled corpus of Urdu tweets extracted from the Twitter network. One of the main contributions of this research includes the development of a large labeled corpus of Urdu Tweets for sentiment analysis. To the best of our knowledge, there is no such corpus available publicly in the Urdu Language. The labeled dataset is available on GitHub ( https://github.com/zarmeen92/urdutweets ). Furthermore, the results showed that the proposed approach outperforms the lexicon-based and traditional machine learning-based approaches of sentiment analysis.

Full Text