Cyberbullying Detection: Hybrid Models Based on Machine Learning and Natural Language Processing Techniques

Chahat Raj,Ayush Agarwal,Bhuva Narayan,Gnana Bharathy,Mukesh Prasad

doi:10.3390/electronics10222810

Abstract

The rise in web and social media interactions has resulted in the efortless proliferation of offensive language and hate speech. Such online harassment, insults, and attacks are commonly termed cyberbullying. The sheer volume of user-generated content has made it challenging to identify such illicit content. Machine learning has wide applications in text classification, and researchers are shifting towards using deep neural networks in detecting cyberbullying due to the several advantages they have over traditional machine learning algorithms. This paper proposes a novel neural network framework with parameter optimization and an algorithmic comparative study of eleven classification methods: four traditional machine learning and seven shallow neural networks on two real world cyberbullying datasets. In addition, this paper also examines the effect of feature extraction and word-embedding-techniques-based natural language processing on algorithmic performance. Key observations from this study show that bidirectional neural networks and attention models provide high classification results. Logistic Regression was observed to be the best among the traditional machine learning classifiers used. Term Frequency-Inverse Document Frequency (TF-IDF) demonstrates consistently high accuracies with traditional machine learning techniques. Global Vectors (GloVe) perform better with neural network models. Bi-GRU and Bi-LSTM worked best amongst the neural networks used. The extensive experiments performed on the two datasets establish the importance of this work by comparing eleven classification methods and seven feature extraction techniques. Our proposed shallow neural networks outperform existing state-of-the-art approaches for cyberbullying detection, with accuracy and F1-scores as high as ~95% and ~98%, respectively.

Highlights

IntroductionSocial media is an interactive tool that brings people together to share information
Social media is an interactive tool that brings people together to share information.The primary function of Online Social Networks (OSNs) is to allow people to communicate virtually by using the internet
We propose a novel architecture for cyberbullying detection that employs a bidirectional Gated Recurrent Units (GRUs) by using Global Vectors (GloVe) for text representation

Summary

Introduction

Social media is an interactive tool that brings people together to share information. Several traditional machine learning algorithms require explicit feature extraction from input data. Deep learning techniques were employed to overcome the limitations of traditional machine learning, eliminating the manual feature extraction step and obtaining better results on large-scale datasets. The state-of-the-art techniques for cyberbullying detection largely rely on RNNs, CNNs, and transformersdue to their mproved accuracies than compared to traditional machine learning classifiers. The embedding techniques experimented with these shallow neural networks include Global Vectors (GloVe), FastText, and Paragram This comparative study examines the performance of algorithms and their feature extraction. We provide a comparative study on the classification performance of four traditional machine learning and seven neural-network-based algorithms. We experiment with several feature extraction techniques and determine best-suited approaches for feature extraction and text embedding for both traditional machine learning and neural-network-based methods.

Related Work

Methodology

Preprocessing and Feature Extraction

Traditional Machine Learning Approaches

Neural Network Approaches

Implementation Details

Experimental Result Analysis

Datasets

Result Analysis

Baseline Comparison

Method

Findings

Conclusions and Future Prospects

Full Text

Published version (

Free)

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Electronics	Publication Date: Nov 16, 2021
Citations: 35	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

Cyberbullying Detection: Hybrid Models Based on Machine Learning and Natural Language Processing Techniques

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Electronics

Lead the way for us

Similar Papers

The Essential Tools of Scientific Machine Learning (Scientific ML)
Christopher Rackauckas
-
Christopher RackauckasChristopher Rackauckas
20 Aug 2019
20 Aug 2019

Comparison and analysis of prediction accuracy between traditional machine learning algorithms and XGBoost algorithm in music emotion classification
Mengxi Yang
Applied and Computational Engineering | VOL. 57
Mengxi YangMengxi Yang
30 Apr 2024
Applied and Computational Engineering | VOL. 57

CRISPRpred(SEQ): a sequence-based method for sgRNA on target activity prediction using traditional machine learning
Ali Haisam Muhammad Rafid ... Mohammad Saifur Rahman
BMC bioinformatics | VOL. 21
Ali Haisam Muhammad Rafid, et. al.Ali Haisam Muhammad Rafid ... Mohammad Saifur Rahman
01 Jun 2020
BMC bioinformatics | VOL. 21

Heart Failure Detection Using Quantum-Enhanced Machine Learning and Traditional Machine Learning Techniques for Internet of Artificially Intelligent Medical Things
Yogesh Kumar ... Mohammad R Khosravi
Wireless Communications and Mobile Computing | VOL. 2021
Yogesh Kumar, et. al.Yogesh Kumar ... Mohammad R Khosravi
17 Dec 2021
Wireless Communications and Mobile Computing | VOL. 2021

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Cyberbullying Detection: Hybrid Models Based on Machine Learning and Natural Language Processing Techniques

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Electronics