Feasibility and Acceptability of ChatGPT Generated Radiology Report Summaries for Cancer Patients.

E.M Chung,A.T Nguyen,M Kamrava,K.M Atkins,S.C Zhang

doi:10.1016/j.ijrobp.2023.06.1662

Abstract

Patients now have direct access to their diagnostic imaging reports. However, they can include complex terminology that can be difficult for patients to understand. ChatGPT (OpenAI, San Francisco, CA) is an artificial intelligence (AI) text-generating model that can simplify complex text and generate human-like responses. We assessed ChatGPT's ability to generate summarized MRI reports for patients with prostate cancer and evaluated physician satisfaction with providing patients with an AI-summarized report. We used ChatGPT to summarize five prostate cancer MRI reports performed at our institution from 2021-2022. Using a standard prompt, we asked ChatGPT to summarize the full MRI reports into a patient letter at a 6th grade reading level. To account for variability in text output, we generated three different summarized reports per unique MRI report. Full MRI and summarized reports were assessed for readability using Flesch-Kincaid Grade Level (FK) score. Radiation oncologists at our institution were asked to evaluate the summarized reports with an anonymous questionnaire. Physicians were shown two full MRI reports and three summarized versions for each full report. For each summarized report, physicians were asked six questions assessing the following: factual correctness, ease of understanding, completeness, potential for harm, overall quality, and likelihood they would send the report to a patient. Qualitative responses were given on a 1-5 Likert-type scale. A total of 15 summarized reports were generated from five full MRI reports using ChatGPT. The median FK score for the full MRI reports and summarized reports was 9.6 vs. 5.0, (p<0.05), respectively. 12 radiation oncologists responded to our questionnaire with experience levels of: resident (25%), attending <5 years (33%), attending 5-10 years (17%), and attending >10 years (25%). The mean [SD] rating across all six summarized reports for each of the questions were: factual correctness (4.0 [0.6], understanding 4.0 [0.7]), completeness (4.1 [0.5]), potential for harm (3.5 [0.9]), overall quality (3.4 [0.9]), and likelihood to send to patient (3.1 [1.1]). 89%, 78%, and 93% of respondents answered agree or strongly agree for correctness, ease of understanding, and completeness of the summarized reports. 51%, 53%, and 46% of respondents answered agree or strongly agree for potential for harm, overall quality, and likelihood to send to patient. ChatGPT was able to summarize prostate MRI reports at a reading level appropriate for patients. Physicians were likely to be satisfied with the summarized reports with respect to factual correctness, ease of understanding, and completeness. They were less likely to be satisfied with respect to potential for harm, overall quality, and likelihood to send to patients. Further research is needed to optimize ChatGPT's ability to summarize radiology reports for patients and understand what factors influence physician trust in AI-summarized reports.

Full Text