Textual Cyberbullying detection using Ensemble of Machine Learning models

Gull Bano Anwar,Muhammad Waqas Anwar

doi:10.1109/icit56493.2022.9988943

Abstract

Nowadays Cyberbullying on social media has become a major problem. Cyberbullying may cause many serious and negative mental, emotional and physical impacts on a person's life. However, Cyberbullying leaves a record that can demonstrate value and give proof to help stop digital abuse. The early detection of Cyberbullying on social media becomes crucial to moving the effect on the social media user. Numerous studies are being done to automatically identify cyberbullying content in this trend. The absence of linguistic resources, especially for recently developed languages, is the main issue and gap in Cyberbullying detection measures. Romanized Urdu is a recently developed and frequently used language in Asian nations on social networking platforms. Using Machine Learning or Deep Learning with Natural Language Processing (NLP) techniques to automatically detect cyberbullying is the best way to stop it. Current research develops an efficient framework to detect Cyberbullying, using NLP tools with Machine Learning models. Using different preprocessing techniques, the proposed study is validated on a roman-Urdu-abusive-comment-detector (RUACD) dataset. Five machine learning models Support Vector Machine (SVM), Naïve Bayes (NB), Logistic Regression (LR), Random Forest (RF), and Decision Tree (DT) are evaluated on the RUACD dataset. From experiments, the current study finds that the SVM, LR, and DT outperformed and achieved promising results. In last, an ensemble of these outperformed models is formed and achieved 95.9% of test accuracy.

Full Text