Arabic Sentiment Analysis of YouTube Comments: NLP-Based Machine Learning Approaches for Content Evaluation

Dhiaa A Musleh,Mamoun Masoud Abdulqader,Mohammed Alghamdi,Nasro Min-Allah,Ali Alkhwaja,Ibrahim Alkhwaja,Faisal Alfawaz,Hussam Abahussain

doi:10.3390/bdcc7030127

Dhiaa A Musleh, Mamoun Masoud Abdulqader + Show 6 more

Open Access

https://doi.org/10.3390/bdcc7030127

Copy DOI

Abstract

YouTube is a popular video-sharing platform that offers a diverse range of content. Assessing the quality of a video without watching it poses a significant challenge, especially considering the recent removal of the dislike count feature on YouTube. Although comments have the potential to provide insights into video content quality, navigating through the comments section can be time-consuming and overwhelming work for both content creators and viewers. This paper proposes an NLP-based model to classify Arabic comments as positive or negative. It was trained on a novel dataset of 4212 labeled comments, with a Kappa score of 0.818. The model uses six classifiers: SVM, Naïve Bayes, Logistic Regression, KNN, Decision Tree, and Random Forest. It achieved 94.62% accuracy and an MCC score of 91.46% with NB. Precision, Recall, and F1-measure for NB were 94.64%, 94.64%, and 94.62%, respectively. The Decision Tree had a suboptimal performance with 84.10% accuracy and an MCC score of 69.64% without TF-IDF. This study provides valuable insights for content creators to improve their content and audience engagement by analyzing viewers’ sentiments toward the videos. Furthermore, it bridges a literature gap by offering a comprehensive approach to Arabic sentiment analysis, which is currently limited in the field.

Full Text