NewsCom-TOX: a corpus of comments on news articles annotated for toxicity in Spanish

Mariona Taulé,Víctor Bargiela,Xavier Bonet,Montserrat Nofre

doi:10.1007/s10579-023-09711-x

Abstract

In this article, we present the NewsCom-TOX corpus, a new corpus manually annotated for toxicity in Spanish. NewsCom-TOX consists of 4359 comments in Spanish posted in response to 21 news articles on social media related to immigration, in order to analyse and identify messages with racial and xenophobic content. This corpus is multi-level annotated with different binary linguistic categories -stance, target, stereotype, sarcasm, mockery, insult, improper language, aggressiveness and intolerance- taking into account not only the information conveyed in each comment, but also the whole discourse thread in which the comment occurs, as well as the information conveyed in the news article, including their images. These categories allow us to identify the presence of toxicity and its intensity, that is, the level of toxicity of each comment. All this information is available for research purposes upon request. Here we describe the NewsCom-TOX corpus, the annotation tagset used, the criteria applied and the annotation process carried out, including the inter-annotator agreement tests conducted. A quantitative analysis of the results obtained is also provided. NewsCom-TOX is a linguistic resource that will be valuable for both linguistic and computational research in Spanish in NLP tasks for the detection of toxic information.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Language Resources and Evaluation	Publication Date: Jan 17, 2024
Citations: 1	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

NewsCom-TOX: a corpus of comments on news articles annotated for toxicity in Spanish

Abstract

Talk to us

Similar Papers

More From: Language Resources and Evaluation

Lead the way for us

Similar Papers

“Eye on the big prize!”: Iconizing the Democratic Alliance in the Daily Sun
Ian Siebörger ... Ralph Adendorff
Stellenbosch Papers in Linguistics Plus | VOL. 66
Ian Siebörger, et. al.Ian Siebörger ... Ralph Adendorff
01 Apr 2023
Stellenbosch Papers in Linguistics Plus | VOL. 66

Identifying Virality Attributes of Arabic Language News Articles
Sejeong Kwon ... Sofiane Abbar
-
Sejeong Kwon, et. al.Sejeong Kwon ... Sofiane Abbar
01 Jan 2015
01 Jan 2015

SPOTLIGHT ON LGBT IN MALAYSIAN ONLINE NEWSPAPERS: INSIGHTS FROM TEXTUAL ANALYTICS
Su-Hie Ting ... Audrea Johnson
EDPACS | VOL. 65
Su-Hie Ting, et. al.Su-Hie Ting ... Audrea Johnson
12 Aug 2021
EDPACS | VOL. 65

Index to Volme 62, 2018
-
Journal of Broadcasting & Electronic Media | VOL. 62
--
02 Oct 2018
Index to Volme 62, 2018
-

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

NewsCom-TOX: a corpus of comments on news articles annotated for toxicity in Spanish

Abstract

Talk to us

Similar Papers

More From: Language Resources and Evaluation