Background: ChatGPT-4.0 and Bard have shown clinical decision support (CDS) potential in general medicine, but their role in EP is unknown. This study aims to evaluate ChatGPT and Bard’s CDS potential by assessing their accuracy in multiple-choice questions (MCQs), guideline recommendations (GRs) and treatment (Tx) suggestions. Methods: Two chatbots were tested with 15 clinical vignettes (CVs) and 47 case-related MCQs from Heart Rhythm Case Reports, focusing on ablation, arrhythmia and CIEDs management. CVs included narrative diagnostic images results. 3 tasks were performed: 1) Generating GRs, rated 0 for incorrect or correct but irrelevant to the primary problem, 0.5 for correct for the primary problem, 1 for case-specific (CS) if relevant to both the primary problem and concomitant conditions (e.g. afib with HF); 2) Suggesting Tx steps, scored 0 for incorrect, 0.5 for correct and 1 for CS. Tx was deemed correct if referenced in the case or guidelines, and CS if used in the case. For Tx responses not CS, a prompt was provided before reassessment. The prompt included one similar CV and its Tx from PubMed case reports. 3) Answering MCQs, rated 1 for correct and 0 for incorrect. Welch's T-test was used for analysis. Results: Bard outperformed ChatGPT in generating CS-GRs (P = 0.01). However, there was no significant difference in CS-Tx suggestions with a prompt (P value = 0.12, Figure 1C) or without a prompt (P value = 0.59, Figure 1A). When prompted for non-CS-Tx responses, ChatGPT significantly improved from 0.66 to 0.93 (P value = 0.02), suggesting an enhanced ability to provide CS-Tx plans post-prompt. In contrast, Bard showed no notable improvement (0.73 vs. 0.76, P value = 0.79, Figure 1B). Both chatbots demonstrated similar MCQ accuracy, with scores below 70%, indicating EP training gaps or the need for prompts to activate existing knowledge. Conclusion: This study showed Bard's superiority in generating GRs and ChatGPT's remarkable improvement in suggesting Tx when external knowledge is provided, revealing their CDS potential in specialized fields.
Read full abstract