Detecting Hate Speech in Arabic Tweets During COVID-19 Using Machine Learning Approaches

Ruba Alhejaili,Abdullah Alsaeedi,Wael M S Yafooz

doi:10.1007/978-981-19-3148-2_39

Abstract

AbstractContent on the Web is increasing day by day, especially on social media, as all users can express their opinions freely and without restrictions. Accordingly, many negative activities have appeared, such as abusive language, racism, and hate speech. Hate speech is one of the negative social media manifestations that require tools to be detected. In this paper, we try to detect hate speech in Arabic tweets published during the COVID-19 pandemic. We compiled a dataset during the pandemic period from January 31 to March 6, 2021. We used a set of machine learning models, namely support vector machine (SVM), random forest (RF), logistic regression (DT), decision tree, AdaBoost, k-nearest neighbors (KNN), and Gaussian naïve Bayes (GNB). For feature extraction, we used TF-IDF, where we trained the dataset in three types: unigram, bigram, and trigram. The best results were achieved by LR, RF, and SVM, with an accuracy of 90.8% for LR.KeywordsHate speechCoronavirus classificationFeature extractionMachine learning

Full Text