Performance of trauma-trained large language models on surgical assessment questions: A new approach in resource identification

Arnav Mahajan,Andrew Tran,Esther S Tseng,John J Como,Kevin M El-Hayek,Prerna Ladha,Vanessa P Ho

doi:10.1016/j.surg.2024.08.026

Abstract

BackgroundLarge language models have successfully navigated simulated medical board examination questions. However, whether and how language models can be used in surgical education is less understood. Our study evaluates the efficacy of domain-specific large language models in curating study materials for surgical board style questions. MethodsWe developed EAST-GPT and ACS-GPT, custom large language models with domain-specific knowledge from published guidelines from the Eastern Association of the Surgery of Trauma and the American College of Surgeons Trauma Quality Programs. EAST-GPT, ACS-GPT, and an untrained GPT-4 performance were assessed trauma-related questions from Surgical Education and Self-Assessment Program (18th edition). Large language models were asked to choose answers and provide answer rationales. Rationales were assessed against an educational framework with 5 domains: accuracy, relevance, comprehensiveness, evidence-base, and clarity. ResultsNinety guidelines trained EAST-GPT and 10 trained ACS-GPT. All large language models were tested on 62 trauma questions. EAST-GPT correctly answered 76%, whereas ACS-GPT answered 68% correctly. Both models outperformed ChatGPT-4 (P < .05), which answered 45% correctly. For reasoning, EAST-GPT achieved the gratest mean scores across all 5 educational framework metrics. ACS-GPT scored lower than ChatGPT-4 in comprehensiveness and evidence-base; however, these differences were not statistically significant. ConclusionOur study presents a novel methodology in identifying test-preparation resources by training a large language model to answer board-style multiple choice questions. Both trained models outperformed ChatGPT-4, demonstrating its answers were accurate, relevant, and evidence-based. Potential implications of such AI integration into surgical education must be explored.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Performance of trauma-trained large language models on surgical assessment questions: A new approach in resource identification

Abstract

Talk to us

Similar Papers

More From: Surgery

Lead the way for us

Similar Papers

A self-supervised language model selection strategy for biomedical question answering
Negar Arabzadeh ... Ebrahim Bagheri
Journal of Biomedical Informatics | VOL. 146
Negar Arabzadeh, et. al.Negar Arabzadeh ... Ebrahim Bagheri
16 Sep 2023
Journal of Biomedical Informatics | VOL. 146

PreparedLLM: effective pre-pretraining framework for domain-specific large language models
Zhou Chen ... Yuqi Bai
Big Earth Data | VOL. ahead-of-print
Zhou Chen, et. al.Zhou Chen ... Yuqi Bai
15 Sep 2024
Big Earth Data | VOL. ahead-of-print

How Can IJDS Authors, Reviewers, and Editors Use (and Misuse) Generative AI?
Galit Shmueli ... Bianca Maria Colosimo
INFORMS Journal on Data Science | VOL. 2
Galit Shmueli, et. al.Galit Shmueli ... Bianca Maria Colosimo
01 Apr 2023
INFORMS Journal on Data Science | VOL. 2

Performance of Large Language Models on a Neurology Board–Style Examination
Marc Cicero Schubert ... Varun Venkataramani
JAMA network open | VOL. 6
Marc Cicero Schubert, et. al.Marc Cicero Schubert ... Varun Venkataramani
07 Dec 2023
JAMA network open | VOL. 6

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Performance of trauma-trained large language models on surgical assessment questions: A new approach in resource identification

Abstract

Talk to us

Similar Papers

More From: Surgery