Large language model (LLM) chatbots have many applications in medical settings. However, these tools can potentially perpetuate racial and gender biases through their responses, worsening disparities in healthcare. With the ongoing discussion of LLM chatbots in oncology and the widespread goal of addressing cancer disparities, this study focuses on biases propagated by LLM chatbots in oncology. Chat Generative Pre-trained Transformer (Chat GPT; OpenAI, San Francisco, CA, USA) was asked to determine what occupation a generic description of "assesses cancer patients" would correspond to for different demographics. Chat GPT, Gemini (Alphabet Inc., Mountain View, CA, USA), and Bing Chat (Microsoft Corp., Redmond, WA, USA) were prompted to provide oncologist recommendations in the top U.S. cities and demographic information (race, gender) of recommendations was compared against national distributions. Chat GPT was also asked to generate a job description for oncologists with different demographic backgrounds. Finally, Chat GPT, Gemini, and Bing Chat were asked to generate hypothetical cancer patients with race, smoking, and drinking histories. LLM chatbots are about two times more likely to predict Blacksand Native Americans as oncology nurses than oncologists, compared to Asians (p < 0.01 and < 0.001, respectively). Similarly, they are also significantly more likely to predict females than males as oncology nurses (p < 0.001). Chat GPT's real-world oncologist recommendations overrepresent Asians by almost double and underrepresent Blacks by double and Hispanics by seven times. Chatbots also generate different job descriptions based on demographics, including cultural competency and advocacy and excluding treatment administration for underrepresented backgrounds. AI-generated cancer cases are not fully representative of real-world demographic distributions and encode stereotypes on substance abuse, such as Hispanics having a greater proportion of smokers than Whites by about 20% in Chat GPT breast cancer cases. To our knowledge, this is the first study of its kind to investigate racial and gender biases of such a diverse set of AI chatbots, and that too, within oncology. The methodology presented in this study provides a framework for targeted bias evaluation of LLMs in various fields across medicine.
Read full abstract