Artificially Intelligent (AI) chatbots have the potential to produce information to support shared prostate cancer (PrCA) decision-making. Therefore, our purpose was to evaluate and compare the accuracy, completeness, readability, and credibility of responses from standard and advanced versions of popular chatbots: ChatGPT-3.5, ChatGPT-4.0, Microsoft Copilot, Microsoft Copilot Pro, Google Gemini, and Google Gemini Advanced. We also investigated whether prompting chatbots for low-literacy PrCA information would improve the readability of responses. Lastly, we determined if the responses were appropriate for African-American men, who have the worst PrCA outcomes. The study used a cross-sectional approach to examine the quality of responses solicited from chatbots. The study did not include human subjects. Eleven frequently asked PrCA questions, based on resources produced by the Centers for Disease Control and Prevention (CDC) and the American Cancer Society (ACS), were posed to each chatbot twice (once for low literacy populations). A coding/rating form containing questions with key points/answers from the ACS or CDC to facilitate the rating process. Accuracy and completeness were rated dichotomously (i.e., yes/no). Credibility was determined by whether a trustworthy medical or health-related organization was cited. Readability was determined using a Flesch-Kincaid readability score calculator that enabled chatbot responses to be entered individually. Average accuracy, completeness, credibility, and readability percentages or scores were calculated using Excel. All chatbots were accurate, but the completeness, readability, and credibility of responses varied. Soliciting low-literacy responses significantly improved readability, but sometimes at the detriment of completeness. All chatbots recognized the higher PrCA risk in African-American men and tailored screening recommendations. Microsoft Copilot Pro had the best overall performance on standard screening questions. Microsoft Copilot outperformed other chatbots on responses for low literacy populations. AI chatbots are useful tools for learning about PrCA screening but should be combined with healthcare provider advice.
Read full abstract