Abstract

Sentiment analysis is widely used in various areas and has versatile applications. For example, it is used in market research, customer retention strategies, and product analysis, to name a few. Although a few works on the topic exist for the Kurdish language, similar to other fields in Kurdish processing, it is not well-studied, and particularly it suffers from data inadequacy. In this paper, we present research we conducted to analyze the sentiments of learners/educators toward online education during COVID-19 in the Kurdistan Region of Iraq. We collected the data from tweets tweeted in Kurdish (Sorani) up to March 2022. We used four Machine Learning algorithms: Naïve Bayes, SVM, Random Forest, and Logistic Regression, and analyzed their performance on our dataset. We retrieved about 600 tweets, which after preprocessing, yielded 511 items. We conducted five experiments, four of which included testing all algorithms using two scenarios of balanced and unbalanced datasets of positive/negative items, each using 80/20 and 90/10 training/testing data splitting methods. The fifth experiment included four parts, setting a limit for feature selection starting at 500 features and increasing it to 500 at a time until 2000 features, testing for both 80/20 and 90/10 data splitting approaches. The results showed that the best algorithm to build a sentiment analysis model is SVM, with an accuracy of 89% and a maximum feature selection of 1000. The dataset is publicly available for non-commercial use under CC0 1.0 Universal license.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call