Limitations of GPT-3.5 and GPT-4 in Applying Fleischner Society Guidelines to Incidental Lung Nodules.

Joel L Gamble,Adnan Sheikh,Duncan Ferguson,Joanna Yuen

doi:10.1177/08465371231218250

Joel L Gamble, Adnan Sheikh + Show 2 more

Open Access

https://doi.org/10.1177/08465371231218250

Copy DOI

Abstract

Purpose: To evaluate the accuracy of GPT-3.5, GPT-4, and a fine-tuned GPT-3.5 model in applying Fleischner Society recommendations to lung nodules. Methods: We generated 10 lung nodule descriptions for each of the 12 nodule categories from the Fleischner Society guidelines, incorporating them into a single fictitious report (n = 120). GPT-3.5 and GPT-4 were prompted to make follow-up recommendations based on the reports. We then incorporated the full guidelines into the prompts and re-submitted them. Finally, we re-submitted the prompts to a fine-tuned GPT-3.5 model. Results were analyzed using binary accuracy analysis in R. Results: GPT-3.5 accuracy in applying Fleischner Society guidelines was 0.058 (95% CI: 0.02, 0.12). GPT-4 accuracy was improved at 0.15 (95% CI: 0.09, 0.23; P = .02 for accuracy comparison). In recommending PET-CT and/or biopsy, both GPT-3.5 and GPT-4 had an F-score of 0.00. After explicitly including the Fleischner Society guidelines in the prompt, GPT-3.5 and GPT-4 significantly improved their accuracy to 0.42 (95% CI: 0.33, 0.51; P < .001) and to 0.66 (95% CI: 0.57, 0.74; P < .001), respectively. GPT-4 remained significantly better than GPT-3.5 (P < .001). The fine-tuned GPT-3.5 model accuracy was 0.46 (95% CI: 0.37, 0.55), not different from the GPT-3.5 model with guidelines included (P = .53). Conclusion: GPT-3.5 and GPT-4 performed poorly in applying widely known guidelines and never correctly recommended biopsy. Flawed knowledge and reasoning both contributed to their poor performance. While GPT-4 was more accurate than GPT-3.5, its inaccuracy rate was unacceptable for clinical practice. These results underscore the limitations of large language models for knowledge and reasoning-based tasks.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Canadian Association of Radiologists Journal	Publication Date: Dec 25, 2023
Citations: 4	License type: CC BY-NC 4.0

R Discovery Prime

R Discovery Prime

Limitations of GPT-3.5 and GPT-4 in Applying Fleischner Society Guidelines to Incidental Lung Nodules.

Abstract

Talk to us

Similar Papers

More From: Canadian Association of Radiologists Journal

Lead the way for us

Similar Papers

P23 A review of advice given for follow up of lung nodules detected on ct imaging
H Rostom ... R Mogal
Thorax | VOL. 71
H Rostom, et. al.H Rostom ... R Mogal
15 Nov 2016
Thorax | VOL. 71

Fleischner Society Guideline Recommendations for Incidentally Detected Pulmonary Nodules and the Probability of Lung Cancer.
Farhood Farjah ... Michael K Gould
Journal of the American College of Radiology | VOL. 19
Farhood Farjah, et. al.Farhood Farjah ... Michael K Gould
01 Nov 2022
Journal of the American College of Radiology | VOL. 19

MTE 27.02 Pulmonary Nodule Guidelines: How Do We Decide Between the IELCAP, ACCP, NCCN, Fleischner Society, BTS, and Lung-RADS?
J.M Goo
Journal of Thoracic Oncology | VOL. 12
J.M GooJ.M Goo
01 Nov 2017
MTE 27.02 Pulmonary Nodule Guidelines: How Do We Decide Between the IELCAP, ACCP, NCCN, Fleischner Society, BTS, and Lung-RADS?
J.M Goo

ES 02.02 The Fleischner Guideline / Lung-RADs
M Callister
Journal of Thoracic Oncology | VOL. 12
M CallisterM Callister
01 Nov 2017
ES 02.02 The Fleischner Guideline / Lung-RADs
M Callister

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Limitations of GPT-3.5 and GPT-4 in Applying Fleischner Society Guidelines to Incidental Lung Nodules.

Abstract

Talk to us

Similar Papers

More From: Canadian Association of Radiologists Journal