Italian Linguistic Features for Toxic Language Detection in Social Media

Leonardo Grotti

doi:10.4000/125no

Leonardo Grotti

Open Access

https://doi.org/10.4000/125no

Copy DOI

Export

Save

Cite

Journal: Italian Journal of Computational Linguistics	Publication Date: Jan 1, 2024
License type: CC BY-NC-ND 4.0

Abstract
Full-Text
Similar Papers

Abstract

Listen

This study addresses the urgent issue of toxic language, prevalent on social media platforms, focusing on the detection of toxic comments on popular Italian Facebook pages. We build upon the framework suggested by the LiLaH project: a standardized framework for analyzing hateful content in multiple languages, including Dutch, English, French, Slovene, and Croatian. We start by examining the linguistic features of Italian toxic language on social media. Our analysis reveals that toxic comments in Italian tend to be longer and have fewer unique emojis compared to non-toxic comments, while both exhibit similar lexical diversity. To evaluate the impact of linguistic features on state-of-the-art models’ performance, we fine-tune three pre-trained language models (PoliBERT, UmBERTo, and bert-base-italian-xxl-uncased). Despite their significant correlation with comments’ toxicity, the inclusion of linguistic features worsens the best model’s performance.

Full Text

Published Version

View

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

Italian Linguistic Features for Toxic Language Detection in Social Media

Abstract

Published Version

Talk to us

Similar Papers

More From: Italian Journal of Computational Linguistics

Lead the way for us

Similar Papers

LiBRA: A Linguistic Bipolar Disorder Recognition Approach
Yen-Hao Huang ... Yi-Hsin Chen
-
Yen-Hao Huang, et. al.Yen-Hao Huang ... Yi-Hsin Chen
01 Aug 2021
01 Aug 2021

Going Viral: The 3 Rs of Social Media Messaging during Public Health Emergencies.
Bhavini Patel Murthy ... Sara J Vagi
Health security | VOL. 19
Bhavini Patel Murthy, et. al.Bhavini Patel Murthy ... Sara J Vagi
01 Feb 2021
Health security | VOL. 19

Roman Urdu toxic comment classification
Hafiz Hassaan Saeed ... Toon Calders
Language Resources and Evaluation | VOL. 55
Hafiz Hassaan Saeed, et. al.Hafiz Hassaan Saeed ... Toon Calders
29 Jan 2021
Language Resources and Evaluation | VOL. 55

Using Digital Media to Promote Kidney Disease Education
Karen Goldstein ... Andrew Narva
Advances in Chronic Kidney Disease | VOL. 20
Karen Goldstein, et. al.Karen Goldstein ... Andrew Narva
26 Jun 2013
Advances in Chronic Kidney Disease | VOL. 20

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

Italian Linguistic Features for Toxic Language Detection in Social Media

Abstract

Published Version

Talk to us

Similar Papers

More From: Italian Journal of Computational Linguistics