Abstract

511 Background: Chatbots based on large language models (LLM) recently developed an unprecedented ability to answer questions across a broad range of applications. Whether LLMs encode sufficient knowledge to answer questions about medical oncology, a highly specialized domain requiring rapid integration of new evidence, is unknown. Methods: We presented ChatGPT (GPT-3.5 and GPT-4) with the American Society of Oncology (ASCO) Self Assessment Program and the European Society of Medical Oncology (ESMO) Examination Trial questions, excluding those that included images or required knowledge unavailable before the algorithm’s training cutoff date. The proportion of correct answers was compared against random chance. ChatGPT was prompted again for a different answer if the previous was incorrect. The reasoning provided by ChatGPT was qualitatively evaluated by two medical oncologists. Results: ChatGPT (GPT-4) correctly answered 84.4% (38/45, 95% confidence interval [CI] 70.5-93.5%, P<0.0001 versus random answering) of ASCO and 86.7% (65/75, 95% CI 76.8-93.4%, P<0.0001) of the ESMO examination questions. GPT-4 outperformed GPT-3.5 (57.8% [26/45, 95% CI 42.2%-72.3%, P=0.001] for ASCO and 65.3% [49/75, 95% CI 53.5%-76.0%, P=0.004] for ESMO). Including second attempts, GPT-4 correctly answered 93.3% (42/45, 95% CI 81.7-98.6%) of ASCO and 93.3% (70/75, 95% CI 85.1-97.8%) of the ESMO examination questions. Incorrect responses for ASCO questions were more common in questions whose answers referenced papers published after 2018 (22.2% [4/18] versus 11.1%, [3/27], P=0.03). Oncologists rated the reasoning behind correct answers by GPT-3.5 as complete for 93.3% of questions (70/75, CI 85.1-97.8%). Conclusions: LLMs can answer examination questions designed for medical oncology fellows with impressive and improving accuracy, alongside correct reasoning. These results imply broad potential applications of LLMs during cancer care to improve the patient and provider experience.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call