Abstract Background Medical trainees are increasingly using online chat-based artificial intelligence (AI) platforms as supplementary resources for board exam preparation and clinical decision support. Prior studies have evaluated the performance of AI chatbots like ChatGPT on various general standardized tests such as the United States Medical Licensing Examination (USMLE), but little is known about their performance on subspecialty-focused exam questions, particularly related to clinical management and treatment. Objective This study aims to evaluate the performance of ChatGPT version 4.0 on the cardiovascular questions from the Medical Knowledge Self-Assessment Program (MKSAP) 19, a widely used resource for board exam preparation in the United States. Methods We submitted all cardiovascular questions from MKSAP 19 to ChatGPT 4.0, covering a broad range of cardiology topics in a multiple-choice format. Performance was gauged against both the official MKSAP answer key and average trainee scores obtained from the MKSAP website. Out of 129 questions, 4 were invalidated due to post-publication data, and 18 were excluded due to reliance on visual aids, leaving 107 questions for analysis. Results ChatGPT 4.0 correctly answered 93 out of 107 questions, reflecting an 87% accuracy rate, compared to a 60% accuracy rate averaged among all human users for the same questions (p<0.0001). ChatGPT accuracy rates for each question category (e.g., heart failure, electrophysiology, etc.) are provided in the figure. On the 14 questions that ChatGPT answered incorrectly, human users averaged a 47% accuracy rate. All 14 incorrectly answered questions were related to clinical management and treatment (as opposed to diagnosis or epidemiology), and the majority (10/14) involved choosing treatment medications or choosing between medications versus interventions. Conclusion ChatGPT 4.0 surpassed the typical 70% accuracy threshold required for passing the internal medicine board certification but performed less well in answering questions about clinical management and treatment. This performance suggests that ChatGPT 4.0 could be an effective adjunct tool for cardiology education, but raises some concerns about its use in clinical decision support for trainees.
Read full abstract