Bias Mitigation for Capturing Potentially Illegal Hate Speech

Johannes Schäfer

doi:10.1007/s13222-023-00439-0

Abstract

Hate speech is a persistent issue in social media. Researchers have analyzed and developed detection methods for hate speech on the basis of example data, even though the phenomenon is only rather vaguely defined. This paper provides an approach to identify hate speech in terms of German laws, which are used as a basis for annotation guidelines applied to real world data. We annotate six labels in a corpus of 1,385 German short text messages: four subcategories of illegal hate speech, offensive language and a neutral class. We consider hate speech expressions as illegal if the linguistic content could be interpreted in a given context possibly violating a specific law. This interpretation and a check by lawyers would be the next step which is not yet included in our annotation. In this paper, I report on strategies to avoid certain biases in data for illegal hate speech. These strategies may serve as a model for building a larger dataset. In experiments, I investigate the capability of a Transformer-based neural network model to learn our classification. The results show that this multiclass classification is still difficult to learn, probably due to the small size of the dataset. I suggest that it is crucial to be aware of data biases and to apply bias mitigation techniques when training hate speech detection systems on such data. Data and scripts of the experiments are made publicly available.

Full Text