Automatic Identification of Harmful, Aggressive, Abusive, and Offensive Language on the Web: A Survey of Technical Biases Informed by Psychology Literature

Agathe Balayn,Alessandro Bozzon,Zoltan Szlavik,Jie Yang

doi:10.1145/3479158

Agathe Balayn, Alessandro Bozzon + Show 2 more

Open Access

https://doi.org/10.1145/3479158

Copy DOI

Abstract

The automatic detection of conflictual languages (harmful, aggressive, abusive, and offensive languages) is essential to provide a healthy conversation environment on the Web. To design and develop detection systems that are capable of achieving satisfactory performance, a thorough understanding of the nature and properties of the targeted type of conflictual language is of great importance. The scientific communities investigating human psychology and social behavior have studied these languages in details, but their insights have only partially reached the computer science community. In this survey, we aim both at systematically characterizing the conceptual properties of online conflictual languages, and at investigating the extent to which they are reflected in state-of-the-art automatic detection systems. Through an analysis of psychology literature, we provide a reconciled taxonomy that denotes the ensemble of conflictual languages typically studied in computer science. We then characterize the conceptual mismatches that can be observed in the main semantic and contextual properties of these languages and their treatment in computer science works; and systematically uncover resulting technical biases in the design of machine learning classification models and the dataset created for their training. Finally, we discuss diverse research opportunities for the computer science community and reflect on broader technical and structural issues.

Highlights

Harmful, aggressive, abusive, and offensive languages in online communications are a growing concern [115, 187, 287]
We focused on online conflictual languages (OCL)
We used online conflictual languages (OCL) to refer to the multitude of hate-related languages, and we explained the ones targeted in the survey

Summary

Introduction

Aggressive, abusive, and offensive languages in online communications are a growing concern [115, 187, 287]. The choice of data source, keyword for retrieving initial sets of samples, and languages for these queries directly impact the type of users for which the subsequent trained model will show good performance. Less obvious choices skew the data distribution; for instance, through the selection of random samples from a forum history or by selecting only the first posts In both cases, the topics discussed might be more or less detailed, or the authors of posts might use more or less strong OCL. The choice of features automatically biases the model towards using certain types of information and biases its outputs towards specific types of errors This is a measurement bias [261], where the choice of features might leave out factors that are relevant for inference. Sharing some information across models while fine-tuning them for specific context remains to be investigated in order not to require too large amount of data and too large computational resources

Methods

Results

Conclusion