Karonese Sentiment Analysis: A New Dataset and Preliminary Result

Ichwanul Muslim Karo Karo,Shahreen Kasim,Azizul Azhar Ramli,Mohd Farhan Md Fudzee

doi:10.30630/joiv.6.2-2.1119

Ichwanul Muslim Karo Karo, Shahreen Kasim + Show 2 more

Open Access

https://doi.org/10.30630/joiv.6.2-2.1119

Copy DOI

Abstract

Amount social media active users are always increasing and come from various backgrounds. An active user habit in social media is to use their local or national language to express their thoughts, social conditions, socialize, ideas, perspectives, and publish their opinions. Karonese is a non-English language prevalent mostly in North Sumatra, Indonesia, with unique morphology and phonology. Sentiment analysis has been frequently used in the study of local or national languages to obtain an overview of the broader public opinion behind a particular topic. Good quality Karonese resources are needed to provide good Karonese sentiment analysis (KSA). Limitation resources become an obstacle in KSA research. This work provides Karonese Dataset from multi-domain social media. To complete the dataset for sentiment analysis, sentiment label annotated by Karonese transcribers, three kinds of experiments were applied: KSA using machine learning, KSA using machine learning with two variants of feature extraction methods. Machine learning algorithms include Logistic Regression, NaÃ¯ve Bayes, Support Vector Machine and K-Nearest Neighbor. Feature extraction improves model performance in the range of 0.1 â€“ 7.4 percent. Overall, TF-IDF as feature extraction on machine learning has a better contribution than BoW. The combination of the SVM algorithm with TF-IDF is the combination with the highest performance. The value of accuracy is 58.1 percent, precision is 58.5 percent, recall is 57.2, and F1 score is 57.84 percent

Full Text