ChatGPT, Bard, and Bing Chat are large language processing models that answered OITE questions with a similar accuracy to first-year orthopaedic surgery residents.

Gage A Guerra,Hayden L Hofmann,Jonathan L Le,Alexander M Wong,Amir Fathi,Cory K Mayfield,Frank A Petrigliano,Joseph N Liu

doi:10.1016/j.arthro.2024.08.023

Abstract

PurposeTo assess ChatGPT, Bard, and BingChat’s ability to generate accurate orthopaedic diagnosis or corresponding treatments by comparing their performance on the Orthopaedic In-Training Examination (OITE) to orthopaedic trainees. MethodsOITE question sets from 2021 and 2022 were compiled to form a large set of 420 questions. ChatGPT (GPT3.5), Bard, and BingChat were instructed to select one of the provided responses to each question. The accuracy of composite questions was recorded and comparatively analyzed to human cohorts including medical students and orthopaedic residents, stratified by post-graduate year. ResultsChatGPT correctly answered 46.3% of composite questions whereas BingChat correctly answered 52.4% and Bard correctly answered 51.4% of questions on the OITE. Upon excluding image-associated questions, ChatGPT, BingChat, and Bard’s overall accuracies improved to 49.1%, 53.5%, and 56.8%, respectively. Medical students and orthopaedic residents (PGY1-5) correctly answered 30.8%, 53.1%, 60.4%, 66.6%, 70.0%, and 71.9%, respectively. ConclusionChatGPT, Bard, and BingChat are AI models that answered OITE questions with an accuracy similar to that of first-year orthopaedic surgery residents. ChatGPT, Bard, and BingChat achieved this result without using images or other supplementary media that human test takers are provided.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

ChatGPT, Bard, and Bing Chat are large language processing models that answered OITE questions with a similar accuracy to first-year orthopaedic surgery residents.

Abstract

Talk to us

Similar Papers

More From: Arthroscopy: The Journal of Arthroscopic and Related Surgery

Lead the way for us

Journal: Arthroscopy: The Journal of Arthroscopic and Related Surgery	Publication Date: Aug 1, 2024
Citations: 1

Similar Papers

Allopathic and Osteopathic Residents Perform Similarly on the Orthopedic In-Training Examination (OITE)
Carolina Gomez ... Mary K Mulcahey
Journal of Surgical Education | VOL. 80
Carolina Gomez, et. al.Carolina Gomez ... Mary K Mulcahey
26 Feb 2023
Journal of Surgical Education | VOL. 80

Can Artificial Intelligence Pass the American Board of Orthopaedic Surgery Examination? Orthopaedic Residents Versus ChatGPT.
Zachary C Lum
Clinical Orthopaedics & Related Research | VOL. 481
Zachary C LumZachary C Lum
23 May 2023
Clinical Orthopaedics & Related Research | VOL. 481

Orthopaedic Resident Burnout Is Associated with Poor In-Training Examination Performance.
Eric J Strauss ... Joseph Zuckerman
Journal of Bone and Joint Surgery | VOL. 101
Eric J Strauss, et. al.Eric J Strauss ... Joseph Zuckerman
02 Oct 2019
Journal of Bone and Joint Surgery | VOL. 101

Recent Trends in Spine Topics on the Orthopaedic In-Training Examination.
Mark J Lambrechts ... Jeremy C Heard
The Journal of the American Academy of Orthopaedic Surgeons | VOL. 30
Mark J Lambrechts, et. al.Mark J Lambrechts ... Jeremy C Heard
29 Aug 2022
The Journal of the American Academy of Orthopaedic Surgeons | VOL. 30

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

ChatGPT, Bard, and Bing Chat are large language processing models that answered OITE questions with a similar accuracy to first-year orthopaedic surgery residents.

Abstract

Talk to us

Similar Papers

More From: Arthroscopy: The Journal of Arthroscopic and Related Surgery