Sentiment Analysis of Arabic Tweets Using Supervised Machine Learning

Noor Khalid Bolbol,Ashraf Yunis Maghari

doi:10.1109/icpet51420.2020.00025

Abstract

The information momentum available on social media is an appropriate environment for identifying users' reactions and attitudes towards a particular topic, products, or any issues. To analyze this data and extract useful information, machine learning algorithms are used to categorize data into predefined categories. Analyzing data in the Arabic language is a challenge, and few studies focus on Arabic text mining. This paper focuses on sentiment analysis of Arabic tweets, in which, it conducts a performance comparison between three machine learning classifiers; Logistic Regression (LR), K-Nearest Neighbors (KNN) and Decision Tree (DT). Four Arabic text datasets are used in the experiments to evaluate the performance of the classifiers. For comparing purpose, we used four evaluation metrics: recall, precision, f-measure, and accuracy. The results show that the Logistic Regression achieves a better accuracy rate in the case of large datasets (93%) compared with the other classifiers. LR showed more improvement by increasing the volume of data, unlike other classifiers that recorded a noticeable decrease in accuracy in the last database (74% for KNN and DT when applying on 100K reviews dataset). Also, KNN and LR classifiers outperform DT classifier when applying them on small datasets such as AJGT and ASTD datasets.

Full Text