ChatGPT takes on the european exam in core cardiology: an AI success story

I Skalidis,S Fournier,O Muller,O Luangphiphat,A Cagnina,E Abbe

doi:10.1093/eurheartj/ehad655.2925

I Skalidis, S Fournier + Show 4 more

Open Access

PDF Available

https://doi.org/10.1093/eurheartj/ehad655.2925

Copy DOI

Export

Save

Cite

Abstract
Full-Text PDF
Similar Papers

Abstract

Listen

Abstract Background ChatGPT, the trending novel artificial intelligence has triggered ongoing debate regarding its capabilities.Recently, preliminary reports showed that answered correctly in the majority of questions of the United States Medical License Examinations (USMLE). However, its ability to succeed a more precise, challenging and high-stakes post-graduate test, such as the final exam for the completion of medical residency , like the European Exam in Core Cardiology (EECC), is not known yet. Purpose We sought to evaluate the performance of ChatGPT on EECC, to test its capability on a more demanding, high-stakes post-graduate exam in Cardiology training. Methods A total of 488 publicly-available single-answer multiple choice questions (MCQs) were randomly obtained from three different MCQs sources that are traditionally used for the preparation for the EECC: 88 from the sample exam questions released since 2018 from the official ESC website, 200 from the 2022 edition of StudyPRN and 200 from the Braunwald's Heart Disease Review and Assessment (BHDRA). Questions containing audio or visual assets were excluded. After filtering, 362 MCQ items (ESC sample: 68, BHDRA:150, StudyPRN: 144) were included and considered as input source. False responses and indeterminate responses were considered as not correct. Results ChatGPT answered to 340 questions out of 362, with 22 indeterminate answers in total. The overall accuracy was 58.8% across all the question sources. More specifically, it demonstrated an accuracy for the ESC sample, BHDRA and StudyPRN of 61.7%, 52.6%, 63.8% respectively. It answered correctly 42/68 (4 indeterminate) of ESC sample questions, 79/150 (11 indeterminate) of the BHDRA and 92/144 (7 indeterminate) of the StudyPRN. Conclusion ChatGPT manages to correctly answer the majority of EECC’s questions and perform within the passing threshold range. Although it cannot yet process visual content, it can provide rational and correct answers to text-based inputs in most scenarios. The model may be able to efficiently handle a massive amount of acquired medical knowlededge, but the current approach may not substitue for critical thinkg, innovation and creativity; some of the key attibutes that doctors are expected to showcase.Performance of ChatGPT on EECCExample of MCQ input at ChatGPT

Full Text