ObjectiveTo identify and quantify ability bias in generative artificial intelligence large language model chatbots, specifically OpenAI's ChatGPT and Google's Gemini. DesignObservational study of language usage in generative artificial intelligence models. SettingInvestigation-only browser profile restricted to ChatGPT and Gemini. ParticipantsEach chatbot generated 60 descriptions of people prompted without specified functional status, 30 descriptions of people with a disability, 30 descriptions of patients with a disability, and 30 descriptions of athletes with a disability (N=300). InterventionsNot applicable. Main Outcome MeasuresGenerated descriptions produced by the models were parsed into words that were linguistically analyzed into favorable qualities or limiting qualities. ResultsBoth large language models significantly underestimated disability in a population of people, and linguistic analysis showed that descriptions of people, patients, and athletes with a disability were generated as having significantly fewer favorable qualities and significantly more limitations than people without a disability in both ChatGPT and Gemini. ConclusionsGenerative artificial intelligence chatbots demonstrate quantifiable ability bias and often exclude people with disabilities in their responses. Ethical use of these generative large language model chatbots in medical systems should recognize this limitation, and further consideration should be taken in developing equitable artificial intelligence technologies.
Read full abstract