Introduction: Multidisciplinary team (MDT) discussions are integral to Transcatheter Aortic Valve Implantation (TAVI) decision making. Large language model (LLM) ubiquity and low-code no-code platforms have enabled clinician lead solution development. Specialised chatbots or ‘agents’ have evolved into multi-agent systems that can personify human collaboration. We assess the performance of an artificial intelligence (AI) multi-agent TAVI MDT. Methods: Four de-identified TAVI cases from two metropolitan Australian hospitals were assessed by a mock human TAVI MDT (h-MDT) and an AI multi-agent TAVI MDT (ai-MDT). The ai-MDT was created with Agentflow within Flowise AITM and had a hierarchical multi-agent architecture suited to complex reasoning required for TAVI MDT simulation (figure). LLM limitations necessitated the ai-MDT rely on imaging reports rather than clinical images. The h-MDT and ai-MDT consisted of similar team members. Outputs from the h-MDT and ai-MDT was adjudicated by a panel of four blinded TAVI doctors that determined if output was human vs AI and assigned a SMIC score (4-12, 4=good, 12=poor) that assessed structure, missing information, incorrect information and clinical utility. Time durations for h-MDT and ai-MDT were recorded. Results: Adjudicators differentiated human vs. AI output 100% of the time and ai-MDT output had better SMIC scores than h-MDT (U-stat 213, p=0.0011). ai-MDT outperformed h-MDT in the domains of structure, missing information and clinical utility but was not statistically different in the incorrect information domain (U-stat 132, p=0.88). The average time for each case in h-MDT was 15 minutes and 45 seconds compared to 97 seconds for ai-MDT. Conclusion: This demonstrates the potential of using LLM based multi-agent systems as a clinical adjunct in highly specialized multidisciplinary clinical meetings. AI responses were superior for structure, clinical utility and missing information and non-inferior for incorrect information compared to humans, which highlights that hallucinations remain an issue with generative AI. Time was saved but image interpretation still requires human input, for now. Cognitive AI continues to require human supervision for implementation.
Read full abstract