Performance of an online chat-based artificial intelligence interface for patient education on atrial fibrillation ablation

J Han,T Baykaner,S Mittal

doi:10.1093/europace/euae102.565

Abstract

Abstract Background Chat-based artificial intelligence (AI) web interfaces that aim to mimic human conversation have increasing utilization in healthcare to help with simple tasks such as scheduling appointments, and even more complex tasks such as providing patient educational responses to COVID-19 questions as done by the World Health Organization.1 Chat-based AI has also been shown to provide accurate responses to cardiovascular disease prevention questions.2 Its ability to provide patient education for more complex treatments like atrial fibrillation (AF) ablation has not been explored. Purpose To evaluate the quality of a popular chat-based AI program’s answers to patient questions about AF ablation. Methods Twenty commonly asked questions ("prompts") regarding AF ablation were entered into ChatGPT (Chat Generative Pre-trained Transformer), a large language model-based AI program (Fig. 1). Prompts were written in plain language; technical terms were avoided except for "radiofrequency", "cryoablation" and "pulsed field ablation" (PFA). SMOG readability calculator was used to assess responses for difficulty and grade-level, as healthcare organizations recommend ≤ 8th-grade level complexity for patient information. Response content was graded by 3 experienced cardiac electrophysiologists as "reasonable", "missing important elements/some inaccuracies" or "misleading/inappropriate". Responses are presented in mean +/- standard deviation and percentages. Results Responses averaged 118±67 words (Fig. 1). Of 20 responses, 17 (85%) were deemed reasonable, 3 (15%) missing important elements/some inaccuracies and none inappropriate or misleading; 16 (80%) emphasized discussion of issues with the healthcare team (Fig. 2). Responses missing important elements/some inaccuracies were those about risks/complications of ablation [missing phrenic nerve palsy, atrioesophageal fistula (AEF), potential need for emergent cardiac surgery or pacemaker, death], concerning symptoms post-procedure (missing symptoms of hematoma, AEF, stroke), and that PFA is not yet approved for use in all regions. Average reading grade level of responses was 13.8 (college level or "professional"): 17 (85%) responses were 12th grade level, 11 (55%) were college-level or higher, and 6 (30%) were college-graduate level ("extremely difficult"). None were ≤ 8th grade level (Fig. 2). Conclusions A majority of ChatGPT responses to common patient questions about AF ablation had reasonable content quality that frequently emphasized the importance of discussion with the healthcare team. However, responses to more difficult questions regarding risks, symptoms of potential complications, or newer technology missed important details; more than half of responses required college-level reading skills. While use of Chat-AI for patient education on EP topics appears promising, patients should be advised to use caution. Further AI training to improve content and readability should be explored.

Full Text