Abstract

The advent of social media, particularly Twitter, raises many issues due to a misunderstanding regarding the concept of freedom of speech. One of these issues is cyberbullying, which is a critical global issue that affects both individual victims and societies. Many attempts have been introduced in the literature to intervene in, prevent, or mitigate cyberbullying; however, because these attempts rely on the victims’ interactions, they are not practical. Therefore, detection of cyberbullying without the involvement of the victims is necessary. In this study, we attempted to explore this issue by compiling a global dataset of 37,373 unique tweets from Twitter. Moreover, seven machine learning classifiers were used, namely, Logistic Regression (LR), Light Gradient Boosting Machine (LGBM), Stochastic Gradient Descent (SGD), Random Forest (RF), AdaBoost (ADB), Naive Bayes (NB), and Support Vector Machine (SVM). Each of these algorithms was evaluated using accuracy, precision, recall, and F1 score as the performance metrics to determine the classifiers’ recognition rates applied to the global dataset. The experimental results show the superiority of LR, which achieved a median accuracy of around 90.57%. Among the classifiers, logistic regression achieved the best F1 score (0.928), SGD achieved the best precision (0.968), and SVM achieved the best recall (1.00).

Highlights

  • Due to the significant development of Internet 2.0 technology, social media sites such as Twitter and Facebook have become popular and play a significant role in transforming human life [1,2]

  • AdaBoost has been used in cyberbullying detection by some researchers like [103] and [63], as well as, the work in [104] who used it for cyberbullying detection, where they obtained an accuracy of 76.39% with AdaBoost, utilizing unigrams, comments, profile, and media information as features

  • the Frequency-Inverse Document Frequency (TF-IDF) is a combination of Term Frequency (TF) and IDF, and this algorithm is based on word statistics for text feature extraction

Read more

Summary

Introduction

Due to the significant development of Internet 2.0 technology, social media sites such as Twitter and Facebook have become popular and play a significant role in transforming human life [1,2]. Cyberbullying is a research topic, with researchers aiming to detect, control, and reduce cyberbullying in social media One direction in this field is to detect a user’s intention to post offensive content by analyzing offensive language based on different features, such as the structure and unique content, in addition to the users’ writing style. Another direction of cyberbullying research is to detect text content using machine learning for offensive language detection and classification. The outcomes of the current evaluation will help other researchers to choose a suitable and sufficient classifier for the datasets of global cyberbullying tweets collected from [12,13], because improvements are necessary to further increase the classification accuracy.

Background and Related Work
Machine Learning in Cyberbullying Detection
Logistic Light Gradient Boosting Machine
Stochastic Gradient Descent
Random Forest
Multinomial Naive Bayes
Dataset
Classification Techniques
Results and Discussion
Evaluation Metrics
Conclusions
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call