Can Artificial Intelligence Pass the American Board of Orthopaedic Surgery Examination? Orthopaedic Residents Versus ChatGPT.

Zachary C Lum

doi:10.1097/corr.0000000000002704

Zachary C Lum

Open Access

https://doi.org/10.1097/corr.0000000000002704

Copy DOI

Export

Save

Cite

Abstract
Full-Text
Similar Papers

Abstract

Listen

Advances in neural networks, deep learning, and artificial intelligence (AI) have progressed recently. Previous deep learning AI has been structured around domain-specific areas that are trained on dataset-specific areas of interest that yield high accuracy and precision. A new AI model using large language models (LLM) and nonspecific domain areas, ChatGPT (OpenAI), has gained attention. Although AI has demonstrated proficiency in managing vast amounts of data, implementation of that knowledge remains a challenge. (1) What percentage of Orthopaedic In-Training Examination questions can a generative, pretrained transformer chatbot (ChatGPT) answer correctly? (2) How does that percentage compare with results achieved by orthopaedic residents of different levels, and if scoring lower than the 10th percentile relative to 5th-year residents is likely to correspond to a failing American Board of Orthopaedic Surgery score, is this LLM likely to pass the orthopaedic surgery written boards? (3) Does increasing question taxonomy affect the LLM's ability to select the correct answer choices? This study randomly selected 400 of 3840 publicly available questions based on the Orthopaedic In-Training Examination and compared the mean score with that of residents who took the test over a 5-year period. Questions with figures, diagrams, or charts were excluded, including five questions the LLM could not provide an answer for, resulting in 207 questions administered with raw score recorded. The LLM's answer results were compared with the Orthopaedic In-Training Examination ranking of orthopaedic surgery residents. Based on the findings of an earlier study, a pass-fail cutoff was set at the 10th percentile. Questions answered were then categorized based on the Buckwalter taxonomy of recall, which deals with increasingly complex levels of interpretation and application of knowledge; comparison was made of the LLM's performance across taxonomic levels and was analyzed using a chi-square test. ChatGPT selected the correct answer 47% (97 of 207) of the time, and 53% (110 of 207) of the time it answered incorrectly. Based on prior Orthopaedic In-Training Examination testing, the LLM scored in the 40th percentile for postgraduate year (PGY) 1s, the eighth percentile for PGY2s, and the first percentile for PGY3s, PGY4s, and PGY5s; based on the latter finding (and using a predefined cutoff of the 10th percentile of PGY5s as the threshold for a passing score), it seems unlikely that the LLM would pass the written board examination. The LLM's performance decreased as question taxonomy level increased (it answered 54% [54 of 101] of Tax 1 questions correctly, 51% [18 of 35] of Tax 2 questions correctly, and 34% [24 of 71] of Tax 3 questions correctly; p = 0.034). Although this general-domain LLM has a low likelihood of passing the orthopaedic surgery board examination, testing performance and knowledge are comparable to that of a first-year orthopaedic surgery resident. The LLM's ability to provide accurate answers declines with increasing question taxonomy and complexity, indicating a deficiency in implementing knowledge. Current AI appears to perform better at knowledge and interpretation-based inquires, and based on this study and other areas of opportunity, it may become an additional tool for orthopaedic learning and education.

Full Text

Published Version

View

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

Can Artificial Intelligence Pass the American Board of Orthopaedic Surgery Examination? Orthopaedic Residents Versus ChatGPT.

Abstract

Published Version

Talk to us

Similar Papers

More From: Clinical Orthopaedics & Related Research

Lead the way for us

Journal: Clinical Orthopaedics & Related Research	Publication Date: May 23, 2023
Citations: 70

Similar Papers

Artificial Intelligence Large Language Models Are Nearly Equivalent to Fourth-Year Orthopaedic Residents on the Orthopaedic In-Training Examination: A Cause for Concern or Excitement?
Ashraf Nawari ... Brian Feeley
Journal of Orthopaedic Experience & Innovation | VOL. 6
Ashraf Nawari, et. al.Ashraf Nawari ... Brian Feeley
10 Jan 2025
Journal of Orthopaedic Experience & Innovation | VOL. 6

Generative Artificial Intelligence Performs at a Second-Year Orthopedic Resident Level.
Zachary C Lum ... Robert L Randall
Cureus | VOL. 16
Zachary C Lum, et. al.Zachary C Lum ... Robert L Randall
01 Mar 2024
Cureus | VOL. 16

Allopathic and Osteopathic Residents Perform Similarly on the Orthopedic In-Training Examination (OITE)
Carolina Gomez ... Mary K Mulcahey
Journal of Surgical Education | VOL. 80
Carolina Gomez, et. al.Carolina Gomez ... Mary K Mulcahey
26 Feb 2023
Journal of Surgical Education | VOL. 80

Orthopaedic Surgery Residency Rotations and Correlation With Orthopaedic In-Training Examination Performance
Aaron I Karlen ... Ann E Van Heest
Journal of Surgical Education | VOL. 75
Aaron I Karlen, et. al.Aaron I Karlen ... Ann E Van Heest
12 Feb 2018
Journal of Surgical Education | VOL. 75

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

Can Artificial Intelligence Pass the American Board of Orthopaedic Surgery Examination? Orthopaedic Residents Versus ChatGPT.

Abstract

Published Version

Talk to us

Similar Papers

More From: Clinical Orthopaedics &amp; Related Research

More From: Clinical Orthopaedics & Related Research