Orthopaedic In-Training Examination Research Articles

Introduction Artificial intelligence (AI) models using large language models (LLMs) and non-specific domains have gained attention for their innovative information processing. As AI advances, it's essential to regularly evaluate these tools' competency to maintain high standards, prevent errors or biases, and avoid flawed reasoning or misinformation that could harm patients or spread inaccuracies.Our study aimed to determine the performance of Chat Generative Pre-trained Transformer (ChatGPT) by OpenAIand Google BARD (BARD) in orthopedic surgery, assess performance based on question types, contrast performance between different AIs and compare AI performance to orthopedic residents. Methods We administered ChatGPT and BARD 757 Orthopedic In-Training Examination (OITE) questions. After excluding image-related questions, the AIs answered 390 multiple choice questions, all categorized within 10 sub-specialties (basic science, trauma, sports medicine, spine, hip and knee, pediatrics, oncology, shoulder and elbow, hand, and food and ankle) and three taxonomy classes (recall, interpretation, and application of knowledge).Statistical analysis was performed to analyze the number of questions answered correctly by each AI model, the performance returned by each AI model within the categorized question sub-specialty designation, and the performance of each AI model in comparison to the results returned by orthopedic residents classified by their respective post-graduate year (PGY) level. Results BARD answered more overall questions correctly (58% vs 54%, p<0.001). ChatGPT performed better in sports medicine and basic science and worse in hand surgery, while BARD performed better in basic science (p<0.05). The AIs performed better in recall questions compared to the application of knowledge (p<0.05). Based on previous data, it ranked in the 42nd-96th percentile for post-graduate year ones (PGY1s), 27th-58thfor PGY2s, 3rd-29th for PGY3s, 1st-21st for PGY4s, and 1st-17th for PGY5s. Discussion ChatGPT excelled in sports medicine but fell short in hand surgery, while both AIs performed well in the basic science sub-specialty but performed poorly in the application of knowledge-based taxonomy questions. BARD performed better than ChatGPT overall.Although the AI reached the second-year PGY orthopedic resident level, it fell short of passing the American Board of Orthopedic Surgery (ABOS). Its strengths in recall-based inquiries highlight its potential as an orthopedic learning and educational tool.

Read full abstract

Advances in neural networks, deep learning, and artificial intelligence (AI) have progressed recently. Previous deep learning AI has been structured around domain-specific areas that are trained on dataset-specific areas of interest that yield high accuracy and precision. A new AI model using large language models (LLM) and nonspecific domain areas, ChatGPT (OpenAI), has gained attention. Although AI has demonstrated proficiency in managing vast amounts of data, implementation of that knowledge remains a challenge. (1) What percentage of Orthopaedic In-Training Examination questions can a generative, pretrained transformer chatbot (ChatGPT) answer correctly? (2) How does that percentage compare with results achieved by orthopaedic residents of different levels, and if scoring lower than the 10th percentile relative to 5th-year residents is likely to correspond to a failing American Board of Orthopaedic Surgery score, is this LLM likely to pass the orthopaedic surgery written boards? (3) Does increasing question taxonomy affect the LLM's ability to select the correct answer choices? This study randomly selected 400 of 3840 publicly available questions based on the Orthopaedic In-Training Examination and compared the mean score with that of residents who took the test over a 5-year period. Questions with figures, diagrams, or charts were excluded, including five questions the LLM could not provide an answer for, resulting in 207 questions administered with raw score recorded. The LLM's answer results were compared with the Orthopaedic In-Training Examination ranking of orthopaedic surgery residents. Based on the findings of an earlier study, a pass-fail cutoff was set at the 10th percentile. Questions answered were then categorized based on the Buckwalter taxonomy of recall, which deals with increasingly complex levels of interpretation and application of knowledge; comparison was made of the LLM's performance across taxonomic levels and was analyzed using a chi-square test. ChatGPT selected the correct answer 47% (97 of 207) of the time, and 53% (110 of 207) of the time it answered incorrectly. Based on prior Orthopaedic In-Training Examination testing, the LLM scored in the 40th percentile for postgraduate year (PGY) 1s, the eighth percentile for PGY2s, and the first percentile for PGY3s, PGY4s, and PGY5s; based on the latter finding (and using a predefined cutoff of the 10th percentile of PGY5s as the threshold for a passing score), it seems unlikely that the LLM would pass the written board examination. The LLM's performance decreased as question taxonomy level increased (it answered 54% [54 of 101] of Tax 1 questions correctly, 51% [18 of 35] of Tax 2 questions correctly, and 34% [24 of 71] of Tax 3 questions correctly; p = 0.034). Although this general-domain LLM has a low likelihood of passing the orthopaedic surgery board examination, testing performance and knowledge are comparable to that of a first-year orthopaedic surgery resident. The LLM's ability to provide accurate answers declines with increasing question taxonomy and complexity, indicating a deficiency in implementing knowledge. Current AI appears to perform better at knowledge and interpretation-based inquires, and based on this study and other areas of opportunity, it may become an additional tool for orthopaedic learning and education.

Read full abstract

Orthopaedic In-Training Examination Research Articles

Related Topics

Articles published on Orthopaedic In-Training Examination

The Effect of a Flipped Classroom Didactic Curriculum on Orthopaedic In-Training Examination Scores for a Resident Cohort.

Artificial intelligence in orthopaedic education: A comparative analysis of ChatGPT and Bing AI's Orthopaedic In‐Training Examination performance

ChatGPT, Bard, and Bing Chat are large language processing models that answered OITE questions with a similar accuracy to first-year orthopaedic surgery residents.

Inadequate Performance of ChatGPT on Orthopedic Board-Style Written Exams.

Analysis of Hip Preservation Questions on the Orthopaedic In-Training Examination Over the Past 20 Years.

Performance of Two Artificial Intelligence Generative Language Models on the Orthopaedic In-Training Examination.

Evaluating ChatGPT's Capabilities on Orthopedic Training Examinations: An Analysis of New Image Processing Features.

Generative Artificial Intelligence Performs at a Second-Year Orthopedic Resident Level.

Comparison of Artificial Intelligence to Resident Performance on Upper-Extremity Orthopaedic In-Training Examination Questions

Can generative artificial intelligence pass the orthopaedic board examination?

The Rapid Development of Artificial Intelligence: GPT-4's Performance on Orthopedic Surgery Board Questions.

Effect of a surgical skills month on intern performance in the orthopaedic in-training examination

Evaluating ChatGPT Performance on the Orthopaedic In-Training Examination.

Can Artificial Intelligence Pass the American Board of Orthopaedic Surgery Examination? Orthopaedic Residents Versus ChatGPT.

Update on Educational Resources and Evaluation Tools for Orthopaedic Surgery Residents.

The Impact of Post-graduate Year and Program Accreditation Status on In-Training Examination Performance in Orthopaedic Surgery.

Acquisition of Medical Knowledge Varies During Allopathic and Osteopathic Orthopedic Surgery Residency Training.

Fifty Most-Cited Research Articles in Elbow Surgery: A Modern Reading List

The Evolution of the Musculoskeletal Trauma Section of the Orthopaedic In-Training Examination

Analysis of Orthopaedic In-Training Examination Trauma Questions: 2017 to 2021

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

Orthopaedic In-Training Examination Research Articles

Related Topics

Articles published on Orthopaedic In-Training Examination

The Effect of a Flipped Classroom Didactic Curriculum on Orthopaedic In-Training Examination Scores for a Resident Cohort.

Artificial intelligence in orthopaedic education: A comparative analysis of ChatGPT and Bing AI's Orthopaedic In‐Training Examination performance

ChatGPT, Bard, and Bing Chat are large language processing models that answered OITE questions with a similar accuracy to first-year orthopaedic surgery residents.

Inadequate Performance of ChatGPT on Orthopedic Board-Style Written Exams.

Analysis of Hip Preservation Questions on the Orthopaedic In-Training Examination Over the Past 20 Years.

Performance of Two Artificial Intelligence Generative Language Models on the Orthopaedic In-Training Examination.

Evaluating ChatGPT's Capabilities on Orthopedic Training Examinations: An Analysis of New Image Processing Features.

Generative Artificial Intelligence Performs at a Second-Year Orthopedic Resident Level.

Comparison of Artificial Intelligence to Resident Performance on Upper-Extremity Orthopaedic In-Training Examination Questions

Can generative artificial intelligence pass the orthopaedic board examination?

The Rapid Development of Artificial Intelligence: GPT-4's Performance on Orthopedic Surgery Board Questions.

Effect of a surgical skills month on intern performance in the orthopaedic in-training examination

Evaluating ChatGPT Performance on the Orthopaedic In-Training Examination.

Can Artificial Intelligence Pass the American Board of Orthopaedic Surgery Examination? Orthopaedic Residents Versus ChatGPT.

Update on Educational Resources and Evaluation Tools for Orthopaedic Surgery Residents.

The Impact of Post-graduate Year and Program Accreditation Status on In-Training Examination Performance in Orthopaedic Surgery.

Acquisition of Medical Knowledge Varies During Allopathic and Osteopathic Orthopedic Surgery Residency Training.

Fifty Most-Cited Research Articles in Elbow Surgery: A Modern Reading List

The Evolution of the Musculoskeletal Trauma Section of the Orthopaedic In-Training Examination

Analysis of Orthopaedic In-Training Examination Trauma Questions: 2017 to 2021