Abstract

As generative artificial intelligence (GenAI) tools continue advancing, rigorous evaluations are needed to understand their capabilities relative to experienced clinicians and nurses. The aim of this study was to objectively compare the diagnostic accuracy and response formats of ICU nurses versus various GenAI models, with a qualitative interpretation of the quantitative results. This formative study utilized four written clinical scenarios representative of real ICU patient cases to simulate diagnostic challenges. The scenarios were developed by expert nurses and underwent validation against current literature. Seventy-four ICU nurses participated in a simulation-based assessment involving four written clinical scenarios. Simultaneously, we asked ChatGPT-4 and Claude-2.0 to provide initial assessments and treatment recommendations for the same scenarios. The responses from ChatGPT-4 and Claude-2.0 were then scored by certified ICU nurses for accuracy, completeness and response. Nurses consistently achieved higher diagnostic accuracy than AI across open-ended scenarios, though certain models matched or exceeded human performance on standardized cases. Reaction times also diverged substantially. Qualitative response format differences emerged such as concision versus verbosity. Variations in GenAI models system performance across cases highlighted generalizability challenges. While GenAI demonstrated valuable skills, experienced nurses outperformed in open-ended domains requiring holistic judgement. Continued development to strengthen generalized decision-making abilities is warranted before autonomous clinical integration. Response format interfaces should consider leveraging distinct strengths. Rigorous mixed methods research involving diverse stakeholders can help iteratively inform safe, beneficial human-GenAI partnerships centred on experience-guided care augmentation. This mixed-methods simulation study provides formative insights into optimizing collaborative models of GenAI and nursing knowledge to support patient assessment and decision-making in intensive care. The findings can help guide development of explainable GenAI decision support tailored for critical care environments. Patients or public were not involved in the design and implementation of the study or the analysis and interpretation of the data.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call