Abstract

ChatGPT is a natural language processing chatbot with increasing applicability to the medical workflow. Although ChatGPT has been shown to be capable of passing the American Board of Neurological Surgery board examination, there has never been an evaluation of the chatbot in triaging and diagnosing novel neurosurgical scenarios without defined answer choices. In this study, we assess ChatGPT's capability to determine the emergent nature of neurosurgical scenarios and make diagnoses based on information one would find in a neurosurgical consult. Thirty clinical scenarios were given to 3 attendings, 4 residents, 2 physician assistants, and 2 subinterns. Participants were asked to determine if the scenario constituted an urgent neurosurgical consultation and what the most likely diagnosis was. Attending responses provided a consensus to use as the answer key. Generative pretraining transformer (GPT) 3.5 and GPT 4 were given the same questions, and their responses were compared with the other participants. GPT 4 was 100% accurate in both diagnosis and triage of the scenarios. GPT 3.5 had an accuracy of 92.59%, slightly below that of a PGY1 (96.3%), an 88.24% sensitivity, 100% specificity, 100% positive predictive value, and 83.3% negative predicative value in triaging each situation. When making a diagnosis, GPT 3.5 had an accuracy of 92.59%, which was higher than the subinterns and similar to resident responders. GPT 4 is able to diagnose and triage neurosurgical scenarios at the level of a senior neurosurgical resident. There has been a clear improvement between GPT 3.5 and 4. It is likely that the recent updates in internet access and directing the functionality of ChatGPT will further improve its utility in neurosurgical triage.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call