Abstract

AbstractContent on the Web is increasing day by day, especially on social media, as all users can express their opinions freely and without restrictions. Accordingly, many negative activities have appeared, such as abusive language, racism, and hate speech. Hate speech is one of the negative social media manifestations that require tools to be detected. In this paper, we try to detect hate speech in Arabic tweets published during the COVID-19 pandemic. We compiled a dataset during the pandemic period from January 31 to March 6, 2021. We used a set of machine learning models, namely support vector machine (SVM), random forest (RF), logistic regression (DT), decision tree, AdaBoost, k-nearest neighbors (KNN), and Gaussian naïve Bayes (GNB). For feature extraction, we used TF-IDF, where we trained the dataset in three types: unigram, bigram, and trigram. The best results were achieved by LR, RF, and SVM, with an accuracy of 90.8% for LR.KeywordsHate speechCoronavirus classificationFeature extractionMachine learning

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call