Abstract
Technologies for abusive language detection are being developed and applied with little consideration of their potential biases. We examine racial bias in five different sets of Twitter data annotated for hate speech and abusive language. We train classifiers on these datasets and compare the predictions of these classifiers on tweets written in African-American English with those written in Standard American English. The results show evidence of systematic racial bias in all datasets, as classifiers trained on them tend to predict that tweets written in African-American English are abusive at substantially higher rates. If these abusive language detection systems are used in the field they will therefore have a disproportionate negative impact on African-American social media users. Consequently, these systems may discriminate against the groups who are often the targets of the abuse we are trying to detect.
Highlights
Recent work has shown evidence of substantial bias in machine learning systems, which is typically a result of bias in the training data
Our study focuses on racial bias in hate speech and abusive language detection datasets (Waseem, 2016; Waseem and Hovy, 2016; Davidson et al, 2017; Golbeck et al, 2017; Founta et al, 2018), all of which use data collected from Twitter
We train classifiers using each of the datasets and use a corpus of tweets with demographic information to compare how each classifier performs on tweets written in African-American English (AAE) versus Standard American English (SAE) (Blodgett et al, 2016)
Summary
Recent work has shown evidence of substantial bias in machine learning systems, which is typically a result of bias in the training data. Machine learning models are currently being deployed in the field to detect hate speech and abusive language on social media platforms including Facebook, Instagram, and Youtube. The aim of these models is to identify abusive language that directly targets certain individuals or groups, people belonging to protected categories (Waseem et al, 2017). In most cases the bias decreases in magnitude when we condition on particular keywords which may indicate membership in negative classes, yet it still persists We expect that these biases will result in racial discrimination if classifiers trained on any of these datasets are deployed in the field
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.