Measuring and mitigating language model biases in abusive language detection

Rui Song,Fausto Giunchiglia,Yingji Li,Lida Shi,Hao Xu

doi:10.1016/j.ipm.2023.103277

Abstract

Warning: This paper contains abusive samples that may cause discomfort to readers.Abusive language on social media reinforces prejudice against an individual or a specific group of people, which greatly hampers freedom of expression. With the rise of large-scale pre-trained language models, classification based on pre-trained language models has gradually become a paradigm for automatic abusive language detection. However, the effect of stereotypes inherent in language models on the detection of abusive language remains unknown, although this may further reinforce biases against the minorities. To this end, in this paper, we use multiple metrics to measure the presence of bias in language models and analyze the impact of these inherent biases in automatic abusive language detection. On the basis of this quantitative analysis, we propose two different debiasing strategies, token debiasing and sentence debiasing, which are jointly applied to reduce the bias of language models in abusive language detection without degrading the classification performance. Specifically, for the token debiasing strategy, we reduce the discrimination of the language model against protected attribute terms of a certain group by random probability estimation. For the sentence debiasing strategy, we replace protected attribute terms and augment the original text by counterfactual augmentation to obtain debiased samples, and use the consistency regularization between the original data and the augmented samples to eliminate the bias at the sentence level of the language model. The experimental results confirm that our method can not only reduce the bias of the language model in the abusive language detection task, but also effectively improve the performance of abusive language detection.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Measuring and mitigating language model biases in abusive language detection

Abstract

Talk to us

Similar Papers

More From: Information Processing & Management

Lead the way for us

Journal: Information Processing & Management	Publication Date: Feb 7, 2023
Citations: 7

Similar Papers

Comparative Analysis on Joint Modeling of Emotion and Abuse Detection in Bangla Language
Afridi Ibn Rahman ... Farhan
-
Afridi Ibn Rahman, et. al.Afridi Ibn Rahman ... Farhan
01 Jan 2021
01 Jan 2021

Investigating Sampling Bias in Abusive Language Detection
Dante Razo ... Sandra Kübler
-
Dante Razo, et. al.Dante Razo ... Sandra Kübler
01 Jan 2020
01 Jan 2020

Exposing Racial Dialect Bias in Abusive Language Detection: Can Explainability Play a Role?
Marta Marchiori Manerba ... Virginia Morini
-
Marta Marchiori Manerba, et. al.Marta Marchiori Manerba ... Virginia Morini
01 Jan 2023
01 Jan 2023

Transfer language selection for zero-shot cross-lingual abusive language detection
Juuso Eronen ... Michal Wroczynski
Information Processing & Management | VOL. 59
Juuso Eronen, et. al.Juuso Eronen ... Michal Wroczynski
31 May 2022
Information Processing & Management | VOL. 59

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Measuring and mitigating language model biases in abusive language detection

Abstract

Talk to us

Similar Papers

More From: Information Processing &amp; Management

More From: Information Processing & Management