ChatGPT performance in the medical specialty exam: An observational study.

Ayse Dilara Oztermeli,Ahmet Oztermeli

doi:10.1097/md.0000000000034673

Abstract

In our study, we aimed to evaluate the success of ChatGPT by determining its performance in the last 5 medical specialty exams (MSE) conducted and its ranking among the candidates of that year, and to determine its potential use in healthcare services. Publicly available MSE questions and answer keys from the last 5 years were scanned, a total of 1177 questions were included in the study, all questions were asked to the ChatGPT (OpenAI; San Francisco, CA) GPT-3.5 series, which is the March 23, 2023 version. The average score and rank that ChatGPT would receive if it had entered the exam that year were determined. Questions were categorized as short question group, long question group, single select multiple-choice questions, and multi-select multiple-choice questions. The lowest success proportion was determined as 54.3%, and the highest success proportion was 70.9% correct answer percentage. It achieved a sufficient result as 1787th out of 22,214 people in its most successful exam, and 4428th out of 21,476 participants in its least successful one. No statistically significant difference was found between the correct answers it gave to clinical and basic science questions (P: .66). ChatGPT statistically significantly answered a higher proportion of questions correctly in the short questions group compared to the long questions group (P = .03), and in the single select multiple choice questions group compared to the multi-select multiple choice questions group (P < .001). ChatGPT has been successful in the MSE, a challenging exam for doctors in our country. However, it is a fact that ChatGPT is still behind the expert in the field for now, and what will happen with program developments in the future is a matter of curiosity for all of us.

Full Text