Tackling cyber-aggression: Identification and fine-grained categorization of aggressive texts on social media using weighted ensemble of transformers

Omar Sharif,Mohammed Moshiul Hoque

doi:10.1016/j.neucom.2021.12.022

Abstract

The pervasiveness of aggressive content in social media has become a serious concern for government organizations and tech companies because of its pernicious societal effects. In recent years, social media has been repeatedly used as a tool to incite communal aggression, spread distorted propaganda, damage social harmony and demean the identity of individuals or a community in the public spaces. Therefore, restraining the proliferation of aggressive content and detecting them has become an urgent duty. Studies of the identification of aggressive content have mostly been done for English and other high-resource languages. Automatic systems developed for those languages can not accurately identify detrimental contents written in regional languages like Bengali. To compensate this insufficiency, this work presents a novel Bengali aggressive text dataset (called ‘BAD’) with two-level annotation. In level-A, 14158 texts are labeled as either aggressive or non-aggressive. While in level-B, 6807 aggressive texts are categorized into religious, political, verbal and gendered aggression classes each having 2217, 2085, 2043 and 462 texts respectively. This paper proposes a weighted ensemble technique including m-BERT, distil-BERT, Bangla-BERT and XLM-R as the base classifiers to identify and classify the aggressive texts in Bengali. The proposed model can readdress the softmax probabilities of the participating classifiers depending on their primary outcomes. This weighting technique has enabled the model to outdo the simple average ensemble and all other machine learning (ML), deep learning (DL) baselines. It has acquired the highest weighted f1-score of 93.43% in the identification task and 93.11% in the categorization task. Dataset developed as the part of this work is available at https://github.com/BAD-Bangla-Aggressive-Text-Dataset

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Tackling cyber-aggression: Identification and fine-grained categorization of aggressive texts on social media using weighted ensemble of transformers

Abstract

Talk to us

Similar Papers

More From: Neurocomputing

Lead the way for us

Journal: Neurocomputing	Publication Date: Dec 18, 2021
Citations: 27

Similar Papers

THE IMPACT OF SOCIAL MEDIA ACTIVITY, INTERACTIVITY, AND CONTENT ON CUSTOMER SATISFACTION: A STUDY OF FASHION PRODUCTS
Muhammad Tahir Jan ... Johan De Jager
EURASIAN JOURNAL OF BUSINESS AND MANAGEMENT | VOL. 8
Muhammad Tahir Jan, et. al.Muhammad Tahir Jan ... Johan De Jager
01 Jan 2020
EURASIAN JOURNAL OF BUSINESS AND MANAGEMENT | VOL. 8

Public Opinions on Using Social Media Content to Identify Users With Depression and Target Mental Health Care Advertising: Mixed Methods Survey.
Elizabeth Ford ... Vasa Curcin
JMIR Mental Health | VOL. 6
Elizabeth Ford, et. al.Elizabeth Ford ... Vasa Curcin
13 Nov 2019
JMIR Mental Health | VOL. 6

A Quantitative Analysis of Social Media to Determine Trends in Brain Tumor Care and Treatment.
Cylaina E Bird ... Elliott D Kozin
Cureus | VOL. 12
Cylaina E Bird, et. al.Cylaina E Bird ... Elliott D Kozin
17 Nov 2020
Cureus | VOL. 12

Word Level Language Identification of Code Mixing Text in Social Media using NLP
Kasthuri Shanmugalingam ... Sagara Sumathipala
-
Kasthuri Shanmugalingam, et. al.Kasthuri Shanmugalingam ... Sagara Sumathipala
01 Dec 2018
01 Dec 2018

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Tackling cyber-aggression: Identification and fine-grained categorization of aggressive texts on social media using weighted ensemble of transformers

Abstract

Talk to us

Similar Papers

More From: Neurocomputing