Abstract

To evaluate the accuracy of the Emergency Severity Index (ESI) assignments by GPT-4, a large language model (LLM), compared to senior emergency department (ED) nurses and physicians. An observational study of 100 consecutive adult ED patients was conducted. ESI scores assigned by GPT-4, triage nurses, and by a senior clinician. Both model and human experts were provided the same patient data. GPT-4 assigned a lower median ESI score (2.0) compared to human evaluators (median 3.0; p < 0.001), suggesting a potential overestimation of patient severity by the LLM. The results showed differences in the triage assessment approaches between GPT-4 and the human evaluators, including variations in how patient age and vital signs were considered in the ESI assignments. While GPT-4 offers a novel methodology for patient triage, its propensity to overestimate patient severity highlights the necessity for further development and calibration of LLM tools in clinical environments. The findings underscore the potential and limitations of LLM in clinical decision-making, advocating for cautious integration of LLMs in healthcare settings. This study adhered to relevant EQUATOR guidelines for reporting observational studies.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.