Abstract

Abstract Background AI algorithms are used in many healthcare settings, though concerns about underlying bias and accuracy exist. This study assessed accuracy of AI in investigation and diagnosis of acute surgical patients, when compared with clinicians’ performance. Methods Consecutive patients who attended a surgical SDEC unit over a two-week period in October 2023 were identified. The documented clerking history was inputted into ChatGPT (v3.5) and the algorithm was asked to give the most likely diagnosis and a list of recommended investigations. Clinical records were compared to the AI recommendations. Primary outcome measures were the investigative and diagnostic accuracy of the AI. Results 88 patients were identified. The most common presenting complaints were abdominal pain (57 patients, 64.8%), abdominal swelling (10 patients, 11.4%) or abscess (8 patients, 9.1%). AI correctly diagnosed 32 patients (36%). AI accuracy was greatest for patients with an abscess (6 patients, 75%), and less accurate for abdominal masses (4 patients, 40%) and abdominal pain (12 patients, 21%). AI recommended fewer investigations that doctors (6 versus 7 tests, median). AI recommended fewer blood tests (3 versus 5 tests, median) yet was more likely to recommend cross-sectional imaging and microbiological investigations. Conclusions At present, ChatGPT lacks sufficient diagnostic and investigative maturity to confidently replace a formal surgical clerking and misdiagnosed almost 80% of patients with abdominal pain as well as under-investigating most acute general surgical patients. Development of a bespoke machine-learning algorithm may improve accuracy and enhance the role of AI in triage and initial investigation.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call