Hybrid ensemble approaches to online harassment detection in highly imbalanced data

Marwa Tolba,Salima Ouadfel,Souham Meshoul

doi:10.1016/j.eswa.2021.114751

Abstract

Online harassment is a major threat to users of social media platforms, especially young adults and women. It can cause mental illnesses and impacts deeply and negatively economic institutions experiencing cyberbully attacks by losing their credibility and business. This makes automatic detection of online harassment extremely important. Most of current studies within this context apply machine-learning algorithms that assume balanced class distribution. However, this assumption does not hold for most real datasets. This research provides a comprehensive investigation of various approaches that combine diverse techniques under three dimensions: feature representation, imbalanced data handling, and supervised learning. For the first dimension, three word-embedding models have been considered, namely: word2vec, Glove, and SSWE. For the other two dimensions, nine techniques for balancing skewed class distributions have been employed to feed several learning models. In particular, resampling methods, cost-sensitive learning, and Weight-Selection strategy-based methods have been used with deep neural networks. The ultimate goal of this study is to evaluate the potential of using such hybrid approaches to handle the online harassment detection task efficiently using highly-imbalanced Twitter data and to select the best combination concerning the intended purpose. An extensive comparative study has been conducted, and the results have been discussed in terms of three evaluation metrics widely used for imbalanced classification. As main findings, Glove has been found as the best feature representation and some combinations as the best performing most notably LSTM and BLSTM with cost-sensitive learning and VL strategy.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Hybrid ensemble approaches to online harassment detection in highly imbalanced data

Abstract

Talk to us

Similar Papers

More From: Expert Systems with Applications

Lead the way for us

Journal: Expert Systems with Applications	Publication Date: Feb 25, 2021
Citations: 21

Similar Papers

A Comparison Study of Cost-Sensitive Learning and Sampling Methods on Imbalanced Data Sets
Jin Wei Zhang ... Yi Lu
Advanced Materials Research | VOL. 271-273
Jin Wei Zhang, et. al.Jin Wei Zhang ... Yi Lu
01 Jul 2011
Advanced Materials Research | VOL. 271-273

Cost-Sensitive Hypergraph Learning With F-Measure Optimization
Nan Wang ... Ruozhou Liang
IEEE Transactions on Cybernetics | VOL. 53
Nan Wang, et. al.Nan Wang ... Ruozhou Liang
01 May 2023
IEEE Transactions on Cybernetics | VOL. 53

Cost-sensitive learning for imbalanced medical data: a review
Imane Araf ... Ikram Chairi
Artificial Intelligence Review | VOL. 57
Imane Araf, et. al.Imane Araf ... Ikram Chairi
01 Mar 2024
Artificial Intelligence Review | VOL. 57

Cost-sensitive learning strategies for high-dimensional and imbalanced data: a comparative study.
Barbara Pes ... Giuseppina Lai
PeerJ Computer Science | VOL. 7
Barbara Pes, et. al.Barbara Pes ... Giuseppina Lai
24 Dec 2021
PeerJ Computer Science | VOL. 7

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Hybrid ensemble approaches to online harassment detection in highly imbalanced data

Abstract

Talk to us

Similar Papers

More From: Expert Systems with Applications