Using ChatGPT to Improve Readability of Interventional Radiology Procedure Descriptions.

Hossam A Zaki,Michelle Mai,Hazem Abdel-Megid,Sabrina Q R Liew,Simon Kidanemariam,Abdifatah S Omar,Urvi Tiwari,Jad Hamze,Sun Ho Ahn,Aaron W P Maxwell

doi:10.1007/s00270-024-03803-z

Abstract

This project examines ChatGPT's potential to enhance the readability of patient educational materials about interventional radiology (IR) procedures. The descriptions of IR procedures from the Cardiovascular and Interventional Radiological Society of Europe (CIRSE) were used as the original text. Readability scores were calculated using three metrics: Flesch Reading Ease (FRE), Gunning Fog (GF), and the Automated Readability Index (ARI) using an online calculator ( https://readabilityformulas.com ). FRE is scored on a scale of 0-100, where 100 indicates easy-to-read texts, and GF and ARI represent the grade level required to comprehend the text. The DISCERN instrument measured credibility and reliability. ChatGPT was prompted to simplify the texts to a fifth-grade reading level, with subsequent recalculation of readability and DISCERN scores for comparison. Statistical significance was determined using a Wilcoxon Signed-Rank Test. Articles were subsequently organized by subgroups and analyzed. 73 interventional radiology procedures from CIRSE were analyzed. The original FRE score was 47.2 (Difficult), improved to 78.4 (Fairly Easy) by ChatGPT. GF and ARI scores dropped from 14.4 and 11.2 to 7.8 and 5.8, respectively, after simplification, showing significant improvement (p < 0.001). However, the average DISCERN score decreased from 3.73 to 2.99 (p < 0.001) post-ChatGPT simplification. This study shows ChatGPT's ability to make interventional radiology descriptions more readable but highlights its struggle to maintain the original's reliability, suggesting the need for human review and prompt engineering to enhance outcomes. Level 6.

Full Text