e13628 Background: ChatGPT is a conversational artificial intelligence (AI) model that learns from massive text-based datasets and then responds to user input, which often involves completing tasks or answering questions. Recent studies showed ChatGPT’s success in passing multiple specialty medical licensing and board examinations, showcasing its promising capabilities in the medical domain. Here, we investigated ChatGPT's potential as a swift and reliable information source for medical oncologists using board examination style questions and real patient cases. Methods: We randomly selected 121 board-style questions from the American Society of Clinical Oncology Self-Evaluation Program (ASCO SEP). The questions were entered into ChatGPT in both multiple-choice (MC) and open-ended (OE) prompts. ChatGPT’s answers and explanations were evaluated for accuracy and concordance. Non-inferiority analysis was performed with power of 80% at α = 0.05 and non-inferiority margin set at 70% correct answers given the historical board exam pass rate of about 65% correct answers. For subgroup analysis, the questions were categorized by tested competency and primary tumor pathology. ChatGPT was also given 10 questions derived from real patient cases. We compared its responses to the answers provided by experienced oncologists to determine accuracy and practical applicability. Results: ChatGPT answered 75 (62.0%) MC queries correctly. Among the correctly answered queries, 2 responses contained faulty explanations. Such inaccurate or discordant explanations were found in 26 of the 46 incorrectly answered queries. In OE prompts, ChatGPT answered 53 (43.8%) questions correctly with correct explanations for all. Of the 68 incorrect responses, 32 of them contained inaccurate or discordant explanations. Subgroup analysis suggested varying performance across the categories. The best performance was seen with malignant hematology (81.8% of MC and 72.8% of OE prompts answered correctly) while the weakest performance was seen with genitourinary malignancies (60% of MC and 20% of OE prompts answered correctly). As for the real-world patient case questions, responses from ChatGPT and the clinicians were concordant in 5 questions. None of the discordant responses contained inaccurate information while 80% of the concordant responses contained sufficient details to assist with patient management decisions. Conclusions: ChatGPT's performance fell short of the non-inferiority margin, highlighting the challenges with incorporating AI in the rapidly evolving field of medical oncology. Despite the limitations, ChatGPT’s partial success, in both board-style and real-world patient care questions, affirms its potential for clinical utility in future.
Read full abstract