Abstract

Technologies for abusive language detection are being developed and applied with little consideration of their potential biases. We examine racial bias in five different sets of Twitter data annotated for hate speech and abusive language. We train classifiers on these datasets and compare the predictions of these classifiers on tweets written in African-American English with those written in Standard American English. The results show evidence of systematic racial bias in all datasets, as classifiers trained on them tend to predict that tweets written in African-American English are abusive at substantially higher rates. If these abusive language detection systems are used in the field they will therefore have a disproportionate negative impact on African-American social media users. Consequently, these systems may discriminate against the groups who are often the targets of the abuse we are trying to detect.

Highlights

  • Recent work has shown evidence of substantial bias in machine learning systems, which is typically a result of bias in the training data

  • Our study focuses on racial bias in hate speech and abusive language detection datasets (Waseem, 2016; Waseem and Hovy, 2016; Davidson et al, 2017; Golbeck et al, 2017; Founta et al, 2018), all of which use data collected from Twitter

  • We train classifiers using each of the datasets and use a corpus of tweets with demographic information to compare how each classifier performs on tweets written in African-American English (AAE) versus Standard American English (SAE) (Blodgett et al, 2016)

Read more

Summary

Introduction

Recent work has shown evidence of substantial bias in machine learning systems, which is typically a result of bias in the training data. Machine learning models are currently being deployed in the field to detect hate speech and abusive language on social media platforms including Facebook, Instagram, and Youtube. The aim of these models is to identify abusive language that directly targets certain individuals or groups, people belonging to protected categories (Waseem et al, 2017). In most cases the bias decreases in magnitude when we condition on particular keywords which may indicate membership in negative classes, yet it still persists We expect that these biases will result in racial discrimination if classifiers trained on any of these datasets are deployed in the field

Objectives
Methods
Results
Discussion
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call