Computer says ‘no’: Exploring systemic bias in ChatGPT using an audit approach

Louis Lippens

doi:10.1016/j.chbah.2024.100054

Abstract

Large language models offer significant potential for increasing labour productivity, such as streamlining personnel selection, but raise concerns about perpetuating systemic biases embedded into their pre-training data. This study explores the potential ethnic and gender bias of ChatGPT—a chatbot producing human-like responses to language tasks—in assessing job applicants. Using the correspondence audit approach from the social sciences, I simulated a CV screening task with 34,560 vacancy–CV combinations where the chatbot had to rate fictitious applicant profiles. Comparing ChatGPT's ratings of Arab, Asian, Black American, Central African, Dutch, Eastern European, Hispanic, Turkish, and White American male and female applicants, I show that ethnic and gender identity influence the chatbot's evaluations. Ethnic discrimination is more pronounced than gender discrimination and mainly occurs in jobs with favourable labour conditions or requiring greater language proficiency. In contrast, gender bias emerges in gender-atypical roles. These findings suggest that ChatGPT's discriminatory output reflects a statistical mechanism echoing societal stereotypes. Policymakers and developers should address systemic bias in language model-driven applications to ensure equitable treatment across demographic groups. Practitioners should practice caution, given the adverse impact these tools can (re)produce, especially in selection decisions involving humans.

Full Text