Decoding medical jargon: The use of AI language models (ChatGPT-4, BARD, microsoft copilot) in radiology reports

Murat Tepe,Emre Emekli

doi:10.1016/j.pec.2024.108307

Abstract

ObjectiveEvaluate Artificial Intelligence (AI) language models (ChatGPT-4, BARD, Microsoft Copilot) in simplifying radiology reports, assessing readability, understandability, actionability, and urgency classification. MethodsThis study evaluated the effectiveness of these AI models in translating radiology reports into patient-friendly language and providing understandable and actionable suggestions and urgency classifications. Thirty radiology reports were processed using AI tools, and their outputs were assessed for readability (Flesch Reading Ease, Flesch-Kincaid Grade Level), understandability (PEMAT), and the accuracy of urgency classification. ANOVA and Chi-Square tests were performed to compare the models' performances. ResultsAll three AI models successfully transformed medical jargon into more accessible language, with BARD showing superior readability scores. In terms of understandability, all models achieved scores above 70%, with ChatGPT-4 and BARD leading (p < 0.001, both). However, the AI models varied in accuracy of urgency recommendations, with no significant statistical difference (p = 0.284). ConclusionAI language models have proven effective in simplifying radiology reports, thereby potentially improving patient comprehension and engagement in their health decisions. However, their accuracy in assessing the urgency of medical conditions based on radiology reports suggests a need for further refinement. Practice implicationsIncorporating AI in radiology communication can empower patients, but further development is crucial for comprehensive and actionable patient support.

Full Text