Biases in Large Language Models: Origins, Inventory, and Discussion

Roberto Navigli,Björn Ross,Simone Conia

doi:10.1145/3597307

Biases in Large Language Models: Origins, Inventory, and Discussion

Roberto Navigli, Björn Ross + Show 1 more

Open Access

https://doi.org/10.1145/3597307

Copy DOI

Journal: ACM journal of data and information quality	Publication Date: Jun 22, 2023
Citations: 30

Affiliation: Sapienza University of Rome

#Large Language Models #Training Corpus + Show 8 more

Abstract
Full-Text PDF
Similar Papers

Abstract

In this article, we introduce and discuss the pervasive issue of bias in the large language models that are currently at the core of mainstream approaches to Natural Language Processing (NLP). We first introduce data selection bias, that is, the bias caused by the choice of texts that make up a training corpus. Then, we survey the different types of social bias evidenced in the text generated by language models trained on such corpora, ranging from gender to age, from sexual orientation to ethnicity, and from religion to culture. We conclude with directions focused on measuring, reducing, and tackling the aforementioned types of bias.

Full Text