Artificial intelligence (AI) represents and exciting shift for orthopaedic surgery, where its role is rapidly evolving. ChatGPT is an AI language model which is preeminent among those leading the mass consumer uptake of AI. Artamonov and colleagues compared ChatGPT with orthopaedic surgeons when considering the diagnosis and management of anterior shoulder instability; they found a limited correlation between them. This study aims to further explore how reliable ChatGPT is compared with orthopaedic surgeons. Twenty-three statements were extracted from the article "Building Consensus: Development of a Best Practice Guideline (BPG) for Surgical Site Infection (SSI) Prevention in High-risk Pediatric Spine Surgery" by Vitale and colleagues. These included 14 consensus statements and an additional 9 statements that did not reach consensus. ChatGPT was asked to state the extent to which it agreed with each statement. ChatGPT appeared to demonstrate a fair correlation with most expert responses to the 14 consensus statements. It appeared less emphatic than the experts, often stating that it "agreed" with a statement, where the most frequent response from experts was "strongly agree." It reached the opposite conclusion to the majority of experts on a single consensus statement regarding the use of ultraviolet light in the operating theatre; it may have been that ChatGPT was drawing from more up to date literature that was published subsequent to the consensus statement. This study demonstrated a reasonable correlation between ChatGPT and orthopaedic surgeons when providing simple responses. ChatGPT's function may be limited when asked to provide more complex answers. This study adds to a growing body of discussion and evidence exploring AI and whether its function is reliable enough to enter the high-accountability world of health care. This article is of high clinical relevance to orthopaedic surgery given the rapidly emerging applications of AI. This creates a need to understand the level to which AI can function in the clinical setting and the risks that would entail.
Read full abstract