Comparative Performance of Data Mining Techniques for Cyberbullying Detection of Arabic Social Media Text

Omar Kamal Eldien Hussien,Riham Mohamed Younis Haggag,Amal Elsayed Aboutabl

doi:10.17762/ijritcc.v11i11s.8167

Abstract

Cyberbullying has spread like a virus on social media platforms and is getting out of control. According to psychological studies on the subject, the victims are increasingly suffering, sometimes to the point of committing suicide among the victims. The issue of cyberbullying on social media is spreading around the world. Social media use is growing, and it can have useful and negative implications when you take into account how social media platforms are abused through different forms of cyberbullying. Although there is a lot of cyberbullying detection in English, there are few studies in the Arabic language. Data Mining techniques are often used to solve and detect this problem. In this study, different data mining algorithms were used to detect cyberbullying in Arabic texts.. Our study was conducted The Bullying datasets consisted of 26,000 comments written in Arabic and were collected from kaggle.com, the Cyber_2021 dataset consisted of 13,247 comments collected via github.com, and the Data 2022 dataset consisted of 47,224 comments collected via Instagram. Various extraction features CountVectorizer and Tf-Idf were used Accuracy, precision, recall, and the F1 score were used to evaluate classifier performance. In the study, Bagging Classifier achieve high results of Bullying dataset from Kaggle Accuracy 96.04, F1-Score 95.98, Recall 96.04, Precision 95.95, SVC model gave the highest results of Cyber_2021 dataset from Github an Accuracy 98.49, F1-Score 98.49, Recall 98.49, Precision 98.50, while Data 2022 dataset from (Instagram) achieving an Accuracy of 77.51, F1-Score 76.60, Recall 77.51, and Precision 77.24. Were achieved for Tf-Idf Vectorizer. Tf-Idf Vectorizer the best to all results than count Vectorizer .

Full Text