A Comparative Analysis of Machine Learning Techniques for Cyberbullying Detection on Twitter

Amgad Muneer,Suliman Mohamed Fati

doi:10.3390/fi12110187

Abstract

The advent of social media, particularly Twitter, raises many issues due to a misunderstanding regarding the concept of freedom of speech. One of these issues is cyberbullying, which is a critical global issue that affects both individual victims and societies. Many attempts have been introduced in the literature to intervene in, prevent, or mitigate cyberbullying; however, because these attempts rely on the victims’ interactions, they are not practical. Therefore, detection of cyberbullying without the involvement of the victims is necessary. In this study, we attempted to explore this issue by compiling a global dataset of 37,373 unique tweets from Twitter. Moreover, seven machine learning classifiers were used, namely, Logistic Regression (LR), Light Gradient Boosting Machine (LGBM), Stochastic Gradient Descent (SGD), Random Forest (RF), AdaBoost (ADB), Naive Bayes (NB), and Support Vector Machine (SVM). Each of these algorithms was evaluated using accuracy, precision, recall, and F1 score as the performance metrics to determine the classifiers’ recognition rates applied to the global dataset. The experimental results show the superiority of LR, which achieved a median accuracy of around 90.57%. Among the classifiers, logistic regression achieved the best F1 score (0.928), SGD achieved the best precision (0.968), and SVM achieved the best recall (1.00).

Highlights

Due to the significant development of Internet 2.0 technology, social media sites such as Twitter and Facebook have become popular and play a significant role in transforming human life [1,2]
AdaBoost has been used in cyberbullying detection by some researchers like [103] and [63], as well as, the work in [104] who used it for cyberbullying detection, where they obtained an accuracy of 76.39% with AdaBoost, utilizing unigrams, comments, profile, and media information as features
the Frequency-Inverse Document Frequency (TF-IDF) is a combination of Term Frequency (TF) and IDF, and this algorithm is based on word statistics for text feature extraction

Summary

Introduction

Due to the significant development of Internet 2.0 technology, social media sites such as Twitter and Facebook have become popular and play a significant role in transforming human life [1,2]. Cyberbullying is a research topic, with researchers aiming to detect, control, and reduce cyberbullying in social media One direction in this field is to detect a user’s intention to post offensive content by analyzing offensive language based on different features, such as the structure and unique content, in addition to the users’ writing style. Another direction of cyberbullying research is to detect text content using machine learning for offensive language detection and classification. The outcomes of the current evaluation will help other researchers to choose a suitable and sufficient classifier for the datasets of global cyberbullying tweets collected from [12,13], because improvements are necessary to further increase the classification accuracy.

Background and Related Work

Machine Learning in Cyberbullying Detection

Logistic Light Gradient Boosting Machine

Stochastic Gradient Descent

Random Forest

Multinomial Naive Bayes

Dataset

Classification Techniques

Results and Discussion

Evaluation Metrics

Conclusions

Full Text

Published version (

Free)

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Future Internet	Publication Date: Oct 29, 2020
Citations: 85	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

A Comparative Analysis of Machine Learning Techniques for Cyberbullying Detection on Twitter

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Future Internet

Lead the way for us

Similar Papers

Towards Sustainable Equine Welfare: Comparative Analysis of Machine Learning Techniques in Predicting Horse Survival
Mahmoud Ismail
Sustainable Machine Intelligence Journal | VOL. 5
Mahmoud IsmailMahmoud Ismail
29 Nov 2023
Sustainable Machine Intelligence Journal | VOL. 5

Classification and Analysis of Weather Images Using Machine Intelligent Based Approach
Krishna Prasad K ... Kalyan Kumar Jena
International Journal of Applied Engineering and Management Letters | VOL. -
Krishna Prasad K, et. al.Krishna Prasad K ... Kalyan Kumar Jena
29 Aug 2022
International Journal of Applied Engineering and Management Letters | VOL. -

A Machine Intelligent Based Approach for the Classification and Analysis of Tomato Leaf Disease Images
Kalyan Kumar Jena ... Krishna Prasad K
International Journal of Health Sciences and Pharmacy | VOL. -
Kalyan Kumar Jena, et. al.Kalyan Kumar Jena ... Krishna Prasad K
31 Aug 2022
International Journal of Health Sciences and Pharmacy | VOL. -

Classification of Breast Ultrasound Images: An Analysis Using Machine Intelligent Based Approach
Kalyan Kumar Jena ... Krishna Prasad K
International Journal of Management, Technology, and Social Sciences | VOL. -
Kalyan Kumar Jena, et. al.Kalyan Kumar Jena ... Krishna Prasad K
31 Aug 2022
International Journal of Management, Technology, and Social Sciences | VOL. -

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

A Comparative Analysis of Machine Learning Techniques for Cyberbullying Detection on Twitter

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Future Internet