Abstract

Machine learning techniques have been widely used in natural language processing (NLP). However, as revealed by many recent studies, machine learning models often inherit and amplify the societal biases in data. Various metrics have been proposed to quantify biases in model predictions. In particular, several of them evaluate disparity in model performance between protected groups and advantaged groups in the test corpus. However, we argue that evaluating bias at the corpus level is not enough for understanding how biases are embedded in a model. In fact, a model with similar aggregated performance between different groups on the entire data may behave differently on instances in a local region. To analyze and detect such local bias, we propose LOGAN, a new bias detection technique based on clustering. Experiments on toxicity classification and object classification tasks show that LOGAN identifies bias in a local region and allows us to better analyze the biases in model predictions.

Highlights

  • Machine learning models such as deep neural networks have achieved remarkable performance in many natural language processing (NLP) tasks

  • We argue that studying algorithmic fairness at either level does not tell the full story

  • To detect local group bias, we propose LOGAN, a LOcal Group biAs detectioN algorithm to identify biases in local regions

Read more

Summary

Introduction

Machine learning models such as deep neural networks have achieved remarkable performance in many NLP tasks. The model is, more likely to produce an output of man cooking when the agent in the image wears a chef hat We call these biases exhibited in a neighborhood of instances local group bias in contrast with global group bias which is evaluated on the entire corpus. LOGAN adapts a clustering algorithm (e.g., K-Means) to group instances based on their features while maximizing a bias metric (e.g., performance gap across groups) within each cluster. In this way, local group bias is highlighted, allowing a developer to further examine the issue. We find that different topics lead to different levels of local group bias in the toxicity classification

Related Work
Methodology
Toxicity Classification
Method
Object Classification
Conclusion
Reproducibility
Findings
Topic words in different clusters
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call