A Comparative Analysis of Machine Learning Techniques for Cyberbullying Detection on FormSpring in Textual Modality

Anil Kumar K M,Sahana V,Abdulbasit A Darem

doi:10.5815/ijcnis.2023.04.04

Abstract

Social media usage has increased tremendously with the rise of the internet and it has evolved into the most powerful networking platform of the twenty-first century. However, a number of undesirable phenomena are associated with increased use of social networking, such as cyberbullying (CB), cybercrime, online abuse and online trolling. Especially for children and women, cyberbullying can have severe psychological and physical effects, even leading to self-harm or suicide. Because of its significant detrimental social impact, the detection of CB text or messages on social media has attracted more research work. To mitigate CB, we have proposed an automated cyberbullying detection model that detects and classifies cyberbullying content as either bullying or non-bullying (binary classification model), creating a more secure social media experience. The proposed model uses Natural Language Processing (NLP) techniques and Machine Learning (ML) approaches to assess cyberbullying contents. Our main goal is to assess different machine learning algorithms for their performance in cyberbullying detection based on a labelled dataset from Formspring [1]. Nine popular machine learning classifiers namely Bootstrap Aggregation or Bagging, Stochastic Gradient Descent (SGD), Random Forest (RF), Decision Tree (DT), Linear Support Vector Classifier (Linear SVC), Logistic Regression (LR), Adaptive Boosting (AdaBoost), Multinomial Naive Bayes (MNB) and K-Nearest Neighbour (KNN) are considered for the work. In addition, we have experimented with a feature extraction method namely CountVectorizer to obtain features that aid for better classification. The results show that the classification accuracy of AdaBoost classifier is 86.52% which is found better than all other machine learning algorithms used in this study. The proposed work demonstrates the effectiveness of machine learning algorithms in automatic cyberbullying detection as against the very intense and time-consuming approaches for the same problem, thereby by facilitating easy incorporation of an effective approach as tools across different platforms enabling people to use social media safely.

Full Text