United States Medical Licensing Examination Research Articles

BackgroundThe United States Medical Licensing Examination (USMLE) step 1 is one of the two examinations written after completion of the first two years (basic science stage) of medical school to be eligible to apply for residency training in the USA. A huge number and types of study materials are available to prepare for the exam which might confuse students choosing a resource.We investigated learning resources being used by the third and fifth-semester medical students and their association with academic performance. We also compared learning resources and exam scores of high-performing and low-performing students.MethodsData collection was done using structured (quantitative study) and semi-structured (qualitative study) questionnaires during a face-to-face interview. This article is about the quantitative part which was designed as a correlational study. Single factor one-way analysis of variance (ANOVA), Pearson correlation coefficient test, T-test, and Fisher’s exact test were used to analyze the data.ResultsAbout half of all students used three or more commercial resources dealing with the same content. A weak negative correlation was observed between the number of commercial resources and the exam scores, especially when the number of these resources was three or more (r = -0.26). The mean exam score of textbook users was statistically significantly higher than the mean score of textbook non-users (p = 0.01). The usage of textbooks was statistically significantly higher in the cohort of top performers in comparison to the rest of the students (p = 0.006). In addition to less usage of textbooks, the mean number of review books was higher in the group of weakest students (2.84 versus 3.7; p = 0.75).ConclusionsMost students did not use professional textbooks and about half used too many commercial review resources. While the former fact was significantly associated with poor academic performance, the later fact had weak negative correlation with exam score.Pedagogical interventions are urgently needed to make the right type of learning resources available by making professional textbooks more USMLE-oriented and helping the students choose the best and right number of resources for optimum academic performance. By fulfilling the observed needs of the students in this way, they might feel empowered because of self-determination which will motivate studies.

Read full abstract

Background: Artificial intelligence (AI) large language models (LLMs) such as ChatGPT have demonstrated the ability to pass standardized exams. These models are not trained for a specific task, but instead trained to predict sequences of text from large corpora of documents sourced from the internet. It has been shown that even models trained on this general task can pass exams in a variety of domain-specific fields, including the United States Medical Licensing Examination. We asked if large language models would perform as well on a much narrower subdomain tests designed for medical specialists. Furthermore, we wanted to better understand how progressive generations of GPT (generative pre-trained transformer) models may be evolving in the completeness and sophistication of their responses even while generational training remains general. In this study, we evaluated the performance of two versions of GPT (GPT 3 and 4) on their ability to pass the certification exam given to physicians to work as osteoporosis specialists and become a certified clinical densitometrists. The CCD exam has a possible score range of 150 to 400. To pass, you need a score of 300.Methods: A 100-question multiple-choice practice exam was obtained from a 3rd party exam preparation website that mimics the accredited certification tests given by the ISCD (International Society for Clinical Densitometry). The exam was administered to two versions of GPT, the free version (GPT Playground) and ChatGPT+, which are based on GPT-3 and GPT-4, respectively (OpenAI, San Francisco, CA). The systems were prompted with the exam questions verbatim. If the response was purely textual and did not specify which of the multiple-choice answers to select, the authors matched the text to the closest answer. Each exam was graded and an estimated ISCD score was provided from the exam website. In addition, each response was evaluated by a rheumatologist CCD and ranked for accuracy using a 5-level scale. The two GPT versions were compared in terms of response accuracy and length.Results: The average response length was 11.6 ±19 words for GPT-3 and 50.0±43.6 words for GPT-4. GPT-3 answered 62 questions correctly resulting in a failing ISCD score of 289. However, GPT-4 answered 82 questions correctly with a passing score of 342. GPT-3 scored highest on the “Overview of Low Bone Mass and Osteoporosis” category (72 % correct) while GPT-4 scored well above 80 % accuracy on all categories except “Imaging Technology in Bone Health” (65 % correct). Regarding subjective accuracy, GPT-3 answered 23 questions with nonsensical or totally wrong responses while GPT-4 had no responses in that category.Conclusion: If this had been an actual certification exam, GPT-4 would now have a CCD suffix to its name even after being trained using general internet knowledge. Clearly, more goes into physician training than can be captured in this exam. However, GPT algorithms may prove to be valuable physician aids in the diagnoses and monitoring of osteoporosis and other diseases.

Read full abstract

United States Medical Licensing Examination Research Articles

Related Topics

Articles published on United States Medical Licensing Examination

Out-of-Hospital Birth in Medical Education: A Thematic Analysis of Question Banks for Medical Licensing Examinations in the United States.

Letters of Reference in the Current Era.

Dear Program Director: An Evaluation of Implicit Bias in Letters of Recommendation for Neurosurgery Residency.

Evaluating ChatGPT-4 in medical education: an assessment of subject exam performance reveals limitations in clinical curriculum support for students

Exploring the usage of learning resources by medical students in the basic science stage and their effect on academic performance

Gender differences in learning and study strategies impact medical students’ preclinical and USMLE step 1 examination performance

Student Perspectives on the Undergraduate Medical Student Research Experience at a Single United States Allopathic Institution.

Academic performance of students in an accelerated medical pathway

DO seniors and IMGs have lower match probabilities than MD seniors after adjusting for specialty choice and USMLE Step 1 score.

Medical students' perspective on the United States Medical Licensing Examination (USMLE) Step 1 transition to Pass/Fail

Can USMLE and COMLEX-USA Scores Predict At-Risk Emergency Medicine Residents' Performance on In-Training Examinations?

The Impact of United States Medical Licensing Examination Step 1 Transitioning To Pass/Fail on Medical Student Perception of Research Needed to Match Into One's Preferred Specialty.

Matching Against Men: 5 Years of Residency Match Data Show Disparities Still Exist.

Racial and Ethnic Disparities in Situational Judgment Testing Among Applicants to an Anesthesiology Residency Program.

Program Directors' Perspectives on Residency Applications in the Post‒United States Medical Licensing Examination Step 1 Era: A Case for Standardized Letters of Recommendation?

A Comparison Between GPT-3.5, GPT-4, and GPT-4V: Can the Large Language Model (ChatGPT) Pass the Japanese Board of Orthopaedic Surgery Examination?

The performance of large language models in intercollegiate Membership of the Royal College of Surgeons examination.

Evaluating Urology Program Directors' Perception on Resident Application Parameters Following the Transition of USMLE Step 1 to Pass/Fail

Performance of Progressive Generations of GPT on an Exam Designed for Certifying Physicians as Certified Clinical Densitometrists

Update on the Selection Criteria of Plastic Surgery Residents: A Survey of Program Directors and Associate Program Directors.

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

United States Medical Licensing Examination Research Articles

Related Topics

Articles published on United States Medical Licensing Examination

Out-of-Hospital Birth in Medical Education: A Thematic Analysis of Question Banks for Medical Licensing Examinations in the United States.

Letters of Reference in the Current Era.

Dear Program Director: An Evaluation of Implicit Bias in Letters of Recommendation for Neurosurgery Residency.

Evaluating ChatGPT-4 in medical education: an assessment of subject exam performance reveals limitations in clinical curriculum support for students

Exploring the usage of learning resources by medical students in the basic science stage and their effect on academic performance

Gender differences in learning and study strategies impact medical students’ preclinical and USMLE step 1 examination performance

Student Perspectives on the Undergraduate Medical Student Research Experience at a Single United States Allopathic Institution.

Academic performance of students in an accelerated medical pathway

DO seniors and IMGs have lower match probabilities than MD seniors after adjusting for specialty choice and USMLE Step 1 score.

Medical students' perspective on the United States Medical Licensing Examination (USMLE) Step 1 transition to Pass/Fail

Can USMLE and COMLEX-USA Scores Predict At-Risk Emergency Medicine Residents' Performance on In-Training Examinations?

The Impact of United States Medical Licensing Examination Step 1 Transitioning To Pass/Fail on Medical Student Perception of Research Needed to Match Into One's Preferred Specialty.

Matching Against Men: 5 Years of Residency Match Data Show Disparities Still Exist.

Racial and Ethnic Disparities in Situational Judgment Testing Among Applicants to an Anesthesiology Residency Program.

Program Directors' Perspectives on Residency Applications in the Post‒United States Medical Licensing Examination Step 1 Era: A Case for Standardized Letters of Recommendation?

A Comparison Between GPT-3.5, GPT-4, and GPT-4V: Can the Large Language Model (ChatGPT) Pass the Japanese Board of Orthopaedic Surgery Examination?

The performance of large language models in intercollegiate Membership of the Royal College of Surgeons examination.

Evaluating Urology Program Directors' Perception on Resident Application Parameters Following the Transition of USMLE Step 1 to Pass/Fail

Performance of Progressive Generations of GPT on an Exam Designed for Certifying Physicians as Certified Clinical Densitometrists

Update on the Selection Criteria of Plastic Surgery Residents: A Survey of Program Directors and Associate Program Directors.