Abstract

From the last decade, Sentiment Analysis of languages such as English and Chinese are particularly the focus of attention but resource poor languages such as Urdu are mostly ignored by the research community, which is focused in this research. After acquiring data from various blogs of about 14 different genres, the data is being annotated with the help of human annotators. Three well-known classifiers, that is, Support Vector Machine, Decision tree and [Formula: see text]-Nearest Neighbor ([Formula: see text]-NN) are tested, their outputs are compared and their results are ultimately improved in several iterations after taking a number of steps that include stop words removal, feature extraction, identification and extraction of important features. extraction. Initially, the performance of the classifiers is not satisfactory as the accuracy achieved by all the three is below 50%. Ensemble of classifiers is also tried but the results are not fruitful (in terms of high accuracy). The results are analyzed carefully and improvements are made including feature extraction that raised the performance of these classifiers to a satisfactory level. It is further concluded that [Formula: see text]-NN is performing better than Support Vector Machine and Decision tree in terms of accuracy, precision, recall and [Formula: see text]-measure.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.