PurposeArtificial intelligence (AI) and machine learning present an opportunity to enhance clinical decision-making in radiation oncology. This study aims to evaluate the competency of ChatGPT, an AI language model, in interpreting clinical scenarios and assessing its oncology knowledge. Methods and MaterialsA series of clinical cases were designed covering 12 disease sites. Questions were grouped into domains: epidemiology, staging and workup, clinical management, treatment planning, cancer biology, physics, and surveillance. Royal College-certified radiation oncologists (ROs) reviewed cases and provided solutions. ROs scored responses on 3 criteria: conciseness (focused answers), completeness (addressing all aspects of the question), and correctness (answer aligns with expert opinion) using a standardized rubric. Scores ranged from 0 to 5 for each criterion for a total possible score of 15. ResultsAcross 12 cases, 182 questions were answered with a total AI score of 2317/2730 (84 %). Scores by criteria were: completeness (79 %, range: 70–99 %), conciseness (92 %, range: 83–99 %), and correctness (81 %, range: 72–92 %). AI performed best in the domains of epidemiology (93 %) and cancer biology (93 %) and reasonably in staging and workup (89 %), physics (86 %) and surveillance (82 %). Weaker domains included treatment planning (78 %) and clinical management (81 %). Statistical differences were driven by variations in the completeness (p < 0.01) and correctness (p = 0.04) criteria, whereas conciseness scored universally high (p = 0.91). These trends were consistent across disease sites. ConclusionsChatGPT showed potential as a tool in radiation oncology, demonstrating a high degree of accuracy in several oncologic domains. However, this study highlights limitations with incorrect and incomplete answers in complex cases.
Read full abstract