Abstract

Abstract Background The Mayo endoscopic subscore (MES) is widely used to assess endoscopic disease severity assigned by human readers in Ulcerative Colitis (UC) clinical trials. AI-based automation of the MES could reduce inter-rater variability and allow for the development of more sensitive endoscopic measures. This report assesses whether a previously trained (locked) algorithm is suitable for automating full colon or segment-level MES scoring on a prospective UC clinical trial. Methods Endoscopy videos from two UC clinical trials (UNIFI: NCT02407236, Phase 3, 965 subjects, 3128 videos; and JAK-UC: NCT01959282, Phase 2, 211 subjects, 448 videos) were used to train an AI-based MES classifier, where 20% of the total data was retained as a holdout set. This AI model training had two steps: 1) training a feature extraction module using self-supervised learning (SSL), and 2) supervised training of a small transformer network with an attention-based classifier using SSL features to estimate full colon MES. Videos from an independent, prospective UC trial (QUASAR: NCT04033445, Phase 2b induction study, 313 subjects, 615 videos) were used to validate the locked AI model. MES scoring in QUASAR included full colon MES values and additional MES values for three left colon segments: descending colon, sigmoid colon, and rectum. Comparisons between AI-model and human reader scores were performed using AUC, Accuracy, F1 score, and Fleiss kappa. A non-inferiority test was also conducted to determine interchangeability between AI- and human-derived full colon MES values. Results Full colon MES on the QUASAR data showed AUC, Accuracy, and F1 scores of 0.810, 0.687, and 0.693, respectively, comparable to results obtained on the UNIFI holdout data (0.803, 0.645, and 0.647). The Fleiss kappa score was 0.682, comparable to the inter-rater agreement between two human readers-local and central readers (Fleiss kappa = 0.712). The non-inferiority test (p-value < 0.05) indicated that the AI-computed full colon MES readout was interchangeable to that of human readers. Similar performance was observed for the AI-computed segment-level MES: descending colon, sigmoid colon, and rectum as shown in Table 1. This result demonstrates the model's effectiveness at scoring the segment-level MES despite not being trained with segment level ground truth. Conclusion ArgesMES, an AI-based model, underwent successful prospective validation, demonstrating proficiency in automating full colon and segment-level MES scores. ArgesMES has the potential to facilitate rapid, reliable, and reproducible MES scoring at full colon and segment levels in prospective clinical trials.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call