Abstract ChatGPT is a powerful large language model and has been applied to many fields in medicine. Radiobiology is the key in the determining of radiation dose in radiation oncology. We would like to test the efficacy of using ChatGPT in clinical radiobiology. A set of 30 clinical radiobiology questions were used in this study. OpenAI API was used to query GPT-4O and generate corresponding answers (Access Date: 2024/6/26). Five different prompts (involving role-playing, role-playing with in-context learning, chain of thought, chain of thought with in-context training, and in-context training) and no prompt were used to test the ability of the model. Questions were further divided into two subgroups, 15 questions without alpha-beta ratio and 15 questions with alpha-beta ratio(s). Answers were graded using the below grading system: Grade 1: totally incorrect; Grade 2: partially correct (either answer or calculation process is correct); Grade 3: totally correct (both answer and calculation process are correct). The chi-square test was used to compare results between different situations. R studio and R language were used in the statistical analysis; p <0.05 is set as statistically significant. The detailed results for each prompt are shown below: No prompt(grade 1: 9, grade 2: 3, grade 3: 18), role-playing (grade 1: 7, grade 2: 6, grade 3: 17), role-playing with in-context learning(grade 1: 10, grade 2: 7, grade 3: 13), chain of thought(grade 1: 10, grade 2: 5, grade 3: 15), chain of thought with in-context learning(grade 1: 11, grade 2: 2, grade 3: 17), and in-context learning(grade 1: 10, grade 2: 4, grade 3: 16). For grade 1 versus grade 2-3, Chi-square test showed p-value = 0.6708, and for grade 1-2 versus grade 3, chi-square test showed p-value = 0.829. That meant that the performances were similar regardless of the prompt. The detailed results for two question types are shown below: questions without alpha-beta ratio (grade 1: 53, grade 2: 10, grade 3: 27), questions with alpha-beta ratio(s) (grade 1: 4, grade 2: 17, grade 3: 69). For grade 1 versus grade 2-3, Chi-square test showed p-value <0.01, and for grade 1-2 versus grade 3, chi-square test showed p-value <0.01. That meant that the performances were different between question with and without alpha-beta ratio(s). In conclusion, without prompting, ChatGPT can correct about 60% of questions. Using different prompts cannot enhance performance. ChatGPT can perform well in questions with alpha-beta ratio(s). Citation Format: Yung-Shuo Kao, Che-Wei Su, Kun-Yao Dai. Using ChatGPT to solve clinical radiobiology problems. [abstract]. In: Proceedings of the AACR Special Conference in Cancer Research: Translating Targeted Therapies in Combination with Radiotherapy; 2025 Jan 26-29; San Diego, CA. Philadelphia (PA): AACR; Clin Cancer Res 2025;31(2_Suppl):Abstract nr A006.
Read full abstract