Abstract

Question answering (QA) models use retriever and reader systems to answer questions. Reliance on training data by QA systems can amplify or reflect inequity through their responses. Many QA models, such as those for the SQuAD dataset, are trained and tested on a subset of Wikipedia articles which encode their own biases and also reproduce real-world inequality. Understanding how training data affects bias in QA systems can inform methods to mitigate inequity. We develop two sets of questions for closed and open domain questions respectively, which use ambiguous questions to probe QA models for bias. We feed three deep-learning-based QA systems with our question sets and evaluate responses for bias via the metrics. Using our metrics, we find that open-domain QA models amplify biases more than their closed-domain counterparts and propose that biases in the retriever surface more readily due to greater freedom of choice.

Highlights

  • Historical inequities have led to the We build on prior work to develop two new majority of computer science students being male, benchmarks for bias, using questions with mulwhich could lead question answering (QA) models to assume that all tiple answers to reveal model biases

  • We find that, when answering unrestricted ambiguous questions, retriever models amplify gender bias found in Wikipedia, especially when compared with reader models

  • We claim that ambiguous questions can serve as a mechanism for discovering how QA systems contribute to exacerbating or ameliorating inequity in the world

Read more

Summary

Introduction

Historical inequities have led to the We build on prior work to develop two new majority of computer science students being male, benchmarks for bias, using questions with mulwhich could lead QA models to assume that all tiple answers to reveal model biases. We apply of existing inequality apparent in knowledge bases our benchmarks to a set of neural models including and the real world. This may be through exac- BERT (Devlin et al, 2018) and DPR (Karpukhin erbating empirically-observed inequality, e.g., by et al, 2020), test for gender bias, and conclude with providing a list of 90% males in an occupation that a discussion of bias mitigation. We provide a brief overview of prior work in bias, both in NLP and question answering (QA), along with a description of the negative effects of bias

Objectives
Findings
Discussion
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call