Thromboembolic prophylaxis in spine surgery: an analysis of ChatGPT recommendations

Akiro H Duey,Katrina S Nietsch,Bashar Zaidat,Renee Ren,Laura C Mazudie Ndjonko,Nancy Shrestha,Rami Rajjoub,Wasil Ahmed,Timothy Hoang,Michael P Saturno,Justin E Tang,Zachary S Gallate,Jun S Kim,Samuel K Cho

doi:10.1016/j.spinee.2023.07.015

Akiro H Duey, Katrina S Nietsch + Show 12 more

https://doi.org/10.1016/j.spinee.2023.07.015

Copy DOI

Export

Save

Cite

Abstract
Full-Text
Similar Papers

Abstract

Listen

BACKGROUND CONTEXTVenous thromboembolism is a negative outcome of elective spine surgery. However, the use of thromboembolic chemoprophylaxis in this patient population is controversial due to the possible increased risk of epidural hematoma. ChatGPT is an artificial intelligence model which may be able to generate recommendations for thromboembolic prophylaxis in spine surgery. PURPOSETo evaluate the accuracy of ChatGPT recommendations for thromboembolic prophylaxis in spine surgery. STUDY DESIGN/SETTINGComparative analysis. PATIENT SAMPLENone. OUTCOME MEASURESAccuracy, over-conclusiveness, supplemental, and incompleteness of ChatGPT responses compared to the North American Spine Society (NASS) clinical guidelines. METHODSChatGPT was prompted with questions from the 2009 NASS clinical guidelines for antithrombotic therapies and evaluated for concordance with the clinical guidelines. ChatGPT-3.5 responses were obtained on March 5, 2023, and ChatGPT-4.0 responses were obtained on April 7, 2023. A ChatGPT response was classified as accurate if it did not contradict the clinical guideline. Three additional categories were created to further evaluate the ChatGPT responses in comparison to the NASS guidelines: over-conclusiveness, supplementary, and incompleteness. ChatGPT was classified as over-conclusive if it made a recommendation where the NASS guideline did not provide one. ChatGPT was classified as supplementary if it included additional relevant information not specified by the NASS guideline. ChatGPT was classified as incomplete if it failed to provide relevant information included in the NASS guideline. RESULTSTwelve clinical guidelines were evaluated in total. Compared to the NASS clinical guidelines, ChatGPT-3.5 was accurate in 4 (33%) of its responses while ChatGPT-4.0 was accurate in 11 (92%) responses. ChatGPT-3.5 was over-conclusive in 6 (50%) of its responses while ChatGPT-4.0 was over-conclusive in 1 (8%) response. ChatGPT-3.5 provided supplemental information in 8 (67%) of its responses, and ChatGPT-4.0 provided supplemental information in 11 (92%) responses. Four (33%) responses from ChatGPT-3.5 were incomplete, and 4 (33%) responses from ChatGPT-4.0 were incomplete. CONCLUSIONSChatGPT was able to provide recommendations for thromboembolic prophylaxis with reasonable accuracy. ChatGPT-3.5 tended to cite nonexistent sources and was more likely to give specific recommendations while ChatGPT-4.0 was more conservative in its answers. As ChatGPT is continuously updated, further validation is needed before it can be used as a guideline for clinical practice.

Full Text