Abstract

Web 2.0 helped user-generated platforms to spread widely. Unfortunately, it also allowed for cyberbullying to spread. Cyberbullying has negative effects that could lead to cases of depression and low self-esteem. It has become crucial to develop tools for automated cyberbullying detection. The research on developing these tools has been growing over the last decade, especially with the recent advances in machine learning and natural language processing. Given the large body of work on this topic, it is vital to critically review the literature on cyberbullying within the context of these latest advances. In this paper, we survey the automated detection of cyberbullying. Our survey sheds light on some challenges and limitations for the field. The challenges range from defining cyberbullying, data collection, and feature representation to model selection, training, and evaluation. We also provide some suggestions for improving the task of cyberbullying detection. In addition to the survey, we propose to improve the task of cyberbullying detection by addressing some of the raised limitations: 1) Using recent contextual language models like BERT for the detection of cyberbullying; 2) Using slang-based word embeddings to generate better representations of the cyberbullying-related datasets. Our results show that BERT outperforms state-of-the-art cyberbullying detection models and deep learning models. The results also show that deep learning models initialized with slang-based word embeddings outperform deep learning models initialized with traditional word embeddings.

Highlights

  • T HE internet has become an important development tool for young people

  • It is evident that BERT significantly outperformed the models from the replicated study (LR model, updated Multi Layer Perception (MLP) model), as well as the state-of-the-art Long Short Term Memory (LSTM) and Bi-LSTM models on all three tested datasets, achieving the highest performance on the Kaggle-insults dataset with an F1-score of 0.768 and the lowest performance on Twitter-racism with an F1-score of 0.747

  • The Friedman statistical significance test [160] was used to statistically compare the models’ performance, showing that BERT significantly outperformed all the models (p < 0.05). These results demonstrate that BERT significantly improves performance on the task of cyberbullying detection compared to other widely used methods for text classification

Read more

Summary

Introduction

T HE internet has become an important development tool for young people. It provides a great source of information and a tool for communication. The study showed that the children experienced bad language in the form of insults or swearing, aggressive communication or harassment. Social media platforms provide a fruitful environment for cyberbullying in the forms of threats, harassment and exploiting potential victims [3]. The Pew research centre reported in 2017 that 40% of social media users have experienced some form of cyberbullying [4]. Another study that included university students found that among 200 university students, 91% experienced cyberbullying, 55.5% of them on Instagram and 38% on Facebook [5]

Objectives
Results
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call