NeuroGPT-X: toward a clinic-ready large language model.

Edward Guo,Gwynedd E Pickett,Garnette R Sutherland,Taku Sugiyama,Sarthak Sinha,Philip E Stieg,Ryojo Akagami,Madeleine De Lotbiniere-Bassett,Ossama Al-Mefty,Rahul Singh,Marcos Tatagiba,Mehul Gupta,Karl Rössler,Sanju Lama

doi:10.3171/2023.7.jns23573

Abstract

The objective was to assess the performance of a context-enriched large language model (LLM) compared with international neurosurgical experts on questions related to the management of vestibular schwannoma. Furthermore, another objective was to develop a chat-based platform incorporating in-text citations, references, and memory to enable accurate, relevant, and reliable information in real time. The analysis involved 1) creating a data set through web scraping, 2) developing a chat-based platform called neuroGPT-X, 3) enlisting 8 expert neurosurgeons across international centers to independently create questions (n = 1) and to answer (n = 4) and evaluate responses (n = 3) while blinded, and 4) analyzing the evaluation results on the management of vestibular schwannoma. In the blinded phase, all answers were assessed for accuracy, coherence, relevance, thoroughness, speed, and overall rating. All experts were unblinded and provided their thoughts on the utility and limitations of the tool. In the unblinded phase, all neurosurgeons provided answers to a Likert scale survey and long-answer questions regarding the clinical utility, likelihood of use, and limitations of the tool. The tool was then evaluated on the basis of a set of 103 consensus statements on vestibular schwannoma care from the 8th Quadrennial International Conference on Vestibular Schwannoma. Responses from the naive and context-enriched Generative Pretrained Transformer (GPT) models were consistently rated not significantly different in terms of accuracy, coherence, relevance, thoroughness, and overall performance, and they were often rated significantly higher than expert responses. Both the naive and content-enriched GPT models provided faster responses to the standardized question set than expert neurosurgeon respondents (p < 0.01). The context-enriched GPT model agreed with 98 of the 103 (95%) consensus statements. Of interest, all expert surgeons expressed concerns about the reliability of GPT in accurately addressing the nuances and controversies surrounding the management of vestibular schwannoma. Furthermore, the authors developed neuroGPT-X, a chat-based platform designed to provide point-of-care clinical support and mitigate the limitations of human memory. neuroGPT-X incorporates features such as in-text citations and references to enable accurate, relevant, and reliable information in real time. The present study, with its subspecialist-level performance in generating written responses to complex neurosurgical problems for which evidence-based consensus for management is lacking, suggests that context-enriched LLMs show promise as a point-of-care medical resource. The authors anticipate that this work will be a springboard for expansion into more medical specialties, incorporating evidence-based clinical information and developing expert-level dialogue surrounding LLMs in healthcare.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Journal of neurosurgery	Publication Date: Apr 1, 2024
Citations: 3	License type: CC BY-NC-ND 4.0

R Discovery Prime

R Discovery Prime

NeuroGPT-X: toward a clinic-ready large language model.

Abstract

Talk to us

Similar Papers

More From: Journal of neurosurgery

Lead the way for us

Similar Papers

Evaluating the Performance of Large Language Models in Hematopoietic Stem Cell Transplantation Decision Making
Ivan Civettini ... Paola Perfetti
Blood | VOL. 142
Ivan Civettini, et. al.Ivan Civettini ... Paola Perfetti
02 Nov 2023
Blood | VOL. 142

E-185 Customized generative pretrained transformer for simplified patient education of carotid angioplasty and stenting: a feasibility study
A Brake ... E Samaniego
Journal of NeuroInterventional Surgery | VOL. 16
A Brake, et. al.A Brake ... E Samaniego
01 Jul 2024
Journal of NeuroInterventional Surgery | VOL. 16

A guideline-informed language model for paediatric cardiology demonstrates high performance in answering complex medical questions
T Uden ... P Beerbaum
European Heart Journal | VOL. 45
T Uden, et. al.T Uden ... P Beerbaum
28 Oct 2024
European Heart Journal | VOL. 45

Enhancing Privacy and Security in Large-Language Models: A Zero-Knowledge Proof Approach
Shridhar Singh
International Conference on Cyber Warfare and Security | VOL. 19
Shridhar SinghShridhar Singh
21 Mar 2024
International Conference on Cyber Warfare and Security | VOL. 19

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

NeuroGPT-X: toward a clinic-ready large language model.

Abstract

Talk to us

Similar Papers

More From: Journal of neurosurgery