Abstract Background Ulcerative Colitis (UC) endoscopic disease severity assessment is crucial in drug development and impacts clinical outcomes. We developed ArgesUCEIS, a set of three AI algorithms, to estimate UCEIS component scores that quantify the degree of bleeding, ulceration/erosion, and vascular obliteration in endoscopy videos. Our goal is to validate the locked algorithms that automate UCEIS component scoring using prospective UC clinical trial data. Methods We trained three distinct AI algorithms (ArgesUCEIS) to estimate degree of bleeding, ulceration/erosion, and vascular obliteration, respectively, based on human readings using endoscopy videos from a UC clinical trial (UNIFI: NCT02407236, Phase 3, 965 subjects, 3128 videos) where 20% of the total data was kept as a holdout set. The AI-model training had two steps: 1) training a feature extraction module using self-supervised learning (SSL); 2) training a supervised small transformer network with attention-based classifier using SSL features to estimate corresponding UCEIS component scores, with the attention layer potentially identifying representative frames. We validated the locked AI-models for UCEIS component scores using an independent UC trial (QUASAR: NCT04033445, Phase 2b induction study, 313 subjects, 615 videos), scoring baseline and week 12 endoscopic visits. AI model scores were compared to human read UCEIS component scores using AUC, Accuracy, and F1 score. Additionally, a Spearman correlation analysis assessed the relationship between UCEIS component model scores and human-read Mayo endoscopic subscores (MES) at both visit weeks. Results The locked AI models were evaluated on independent QUASAR data for all three component models, showing moderate to high performance with AUC, Accuracy, and F1 score, comparable to the UNIFI holdout set results (Table 1), indicating their generalizability. Figure 1 shows representative frames ("high attention") for each model, providing insights into the frame-level features used by ArgesUCEIS to estimate disease severity. Spearman correlation analysis indicated small to moderate correlations of bleeding, ulceration/erosion, and vascular AI models with human-read MES at baseline (0.22, 0.48, 0.37) and week 12 (0.50, 0.66, 0.61) visits, respectively, all with p-values below 0.05. Conclusion ArgesUCEIS, a set of three AI models for estimating UCEIS component scores, were validated using prospective UC clinical trial data and holdout data. They showed good performance with moderate to high AUC and accuracy, automating endoscopic disease severity assessments. They can be used alongside or independently of MES. They have potential to improve quality, efficiency, and enhance our understanding of drug's endoscopic impact in UC clinical trials.
Read full abstract