Abstract

Purpose ChatGPT-4 is an upgraded version of an artificial intelligence chatbot. The performance of ChatGPT-4 on the United States Medical Licensing Examination (USMLE) has not been independently characterized. We aimed to assess the performance of ChatGPT-4 at responding to USMLE Step 1, Step 2CK, and Step 3 practice questions. Method Practice multiple-choice questions for the USMLE Step 1, Step 2CK, and Step 3 were compiled. Of 376 available questions, 319 (85%) were analyzed by ChatGPT-4 on March 21st, 2023. Our primary outcome was the performance of ChatGPT-4 for the practice USMLE Step 1, Step 2CK, and Step 3 examinations, measured as the proportion of multiple-choice questions answered correctly. Our secondary outcomes were the mean length of questions and responses provided by ChatGPT-4. Results ChatGPT-4 responded to 319 text-based multiple-choice questions from USMLE practice test material. ChatGPT-4 answered 82 of 93 (88%) questions correctly on USMLE Step 1, 91 of 106 (86%) on Step 2CK, and 108 of 120 (90%) on Step 3. ChatGPT-4 provided explanations for all questions. ChatGPT-4 spent 30.8 ± 11.8 s on average responding to practice questions for USMLE Step 1, 23.0 ± 9.4 s per question for Step 2CK, and 23.1 ± 8.3 s per question for Step 3. The mean length of practice USMLE multiple-choice questions that were answered correctly and incorrectly by ChatGPT-4 was similar (difference = 17.48 characters, SE = 59.75, 95%CI = [-100.09,135.04], t = 0.29, p = 0.77). The mean length of ChatGPT-4’s correct responses to practice questions was significantly shorter than the mean length of incorrect responses (difference = 79.58 characters, SE = 35.42, 95%CI = [9.89,149.28], t = 2.25, p = 0.03). Conclusions ChatGPT-4 answered a remarkably high proportion of practice questions correctly for USMLE examinations. ChatGPT-4 performed substantially better at USMLE practice questions than previous models of the same AI chatbot.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call