Abstract

Sentiment analysis is a widely researched area due to its various applications in customer services, brand monitoring, and market research. Automatic sentiment classification is an important but challenging task. Contrary to the English language, sentiment analysis for low-resource languages like Urdu is an under-explored research area. Most of the work on sentiment analysis in the Urdu language is domain-dependent where models are mostly trained and tested on the same dataset on limited domains. However, sentiments in different domains are expressed differently, and manually annotating the datasets for all possible domains is unfeasible. Training a sentiment classifier using annotated data on one domain and testing it on another domain results in poor performance as the terms appearing in the source domain (training data) might not appear in the target (testing data) domain. In this paper, we present a baseline method for cross-domain sentiment analysis in the Urdu language using two different domains. Feature extraction is performed using n-grams and word embedding techniques. Sentiment classification is performed using machine learning and deep learning classifiers. The proposed method achieves an accuracy, precision, recall, and F1 scores of 0.77, 0.83, 0.68, and 0.75, respectively.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call