Abstract

PurposeLarge language models, a subset of artificial intelligence, have immense potential to support human tasks. The role of these models in science and medicine is unclear, requiring strong critical thinking and analysis skills. The objective of our study was to evaluate GPT-4's abilities to assess postoperative complications after renal surgeries. Materials and methodsDischarge summaries were compiled, and patient information was deidentified in a Python-based program. Prompts were engineered in GPT-4 to assess for the presence of postoperative complications. GPT-4 was further asked to interpret each complication's Clavien-Dindo classification and institutional-specific category. GPT-4's database was compared to a human-curated database. Discrepancies were manually reviewed to calculate match and accuracy rates. ResultsApproximately 944 renal surgeries were conducted from August 2005 to March 2022. There was a 79.6% match rate between GPT-4 and human-curated data in detecting postoperative complications. Accuracy rates were 86.7% for GPT-4 and 92.9% for human-curated. A subgroup of 139 patients had a complication detected by both GPT-4 and human with available Clavien-Dindo classification and category information. There was a 37.4% overall match rate for Clavien-Dindo grade and 55.4% match rate for category. ConclusionsGPT-4 was able to accurately detect if there were any postoperative complications. It struggled with the complex task of further analyzing complications, especially with Clavien-Dindo classification, which requires more critical thinking and interpretation. While GPT-4 is not yet ready for advanced postoperative complication analysis, it can still be used to support clinicians in this endeavor.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call