Performance of Two Artificial Intelligence Generative Language Models on the Orthopaedic In-Training Examination.

Marc Lubitz,Luke Latario

doi:10.3928/01477447-20240304-02

Marc Lubitz, Luke Latario

https://doi.org/10.3928/01477447-20240304-02

Copy DOI

Export

Save

Cite

Journal: Orthopedics	Publication Date: Mar 12, 2024
Citations: 4

Abstract
Full-Text
Similar Papers

Abstract

Listen

Artificial intelligence (AI) generative large language models are powerful and increasingly accessible tools with potential applications in health care education and training. The annual Orthopaedic In-Training Examination (OITE) is widely used to assess resident academic progress and preparation for the American Board of Orthopaedic Surgery Part 1 Examination. Open AI's ChatGPT and Google's Bard generative language models were administered the 2022 OITE. Question stems that contained images were input without and then with a text-based description of the imaging findings. ChatGPT answered 69.1% of questions correctly. When provided with text describing accompanying media, this increased to 77.8% correct. In contrast, Bard answered 49.8% of questions correctly. This increased to 58% correct when text describing imaging in question stems was provided (P<.0001). ChatGPT was most accurate in questions within the shoulder category, with 90.9% correct. Bard performed best in the sports category, with 65.4% correct. ChatGPT performed above the published mean of Accreditation Council for Graduate Medical Education orthopedic resident test-takers (66%). There is significant variability in the accuracy of publicly available AI models on the OITE. AI generative language software may play numerous potential roles in the future in orthopedic education, including simulating patient presentations and clinical scenarios, customizing individual learning plans, and driving evidence-based case discussion. Further research and collaboration within the orthopedic community is required to safely adopt these tools and minimize risks associated with their use. [Orthopedics. 2024;47(3):e146-e150.].

Full Text