Abstract

Social websites are major opinion sharing platform now a days with millions of people using them each day. Twitter is one of the top social media platforms on internet. With the introduction of multiple languages over social media, individuals are usually sharing their opinion in their native language. Individual and organizations are interested in people's opinion on internet for their future strategies for business and products. In this paper, we have performed sentiment analysis of Sindhi language using supervised machine learning techniques. The dataset we developed is tweets in Sindhi language. Automated approach using predefined lexicon is used to give positive and negative polarity to tweets so we have two classes. Pre-processing involves removal of non-Sindhi words, extra white spaces, removal of punctuations, after that tokenization of text is performed. After data cleaning and labelling, supervised machine learning techniques were applied on the Sindhi tweets dataset. We applied Support Vector Machine (SVM), Naive Bayes, Decision Tree and K-nearest neighbour. Results showed that Decision Tree and K-nearest neighbour gave the best accuracy on Sindhi tweets dataset followed by SVM.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call