Abstract Background Manual scoring of ulcerative colitis (UC) disease activity from endoscopy videos using the Modified Mayo Score (mMS) by experts is known to have high inter-rater and intra-rater variability.1 In clinical trials, central reviewers review endoscopy videos and provide a single score for the entire video leading to a loss of transparency with no evidence of scoring rationale or segmental involvement. We developed an AI-assisted severity scoring application for clinical trials to address this challenge by intelligently highlighting video segments showing pathological information and providing a continuous scoring approach to severity grading. Methods Unlike previous approaches that rely heavily on large-scale, expensive, and restrictive frame-level expert labels, we developed our models on video-level grading available in typical clinical trial settings. We used data from the brazikumab clinical trial [NCT03616821], comprising 423 videos from 249 patients scored on the mMS scale. Our preprocessing module removes uninformative, low-quality frames and applies detection, tracking and alignment algorithms to identify regions of interest across frames. Our EndoUC model has 3 components: a pathology-enriched frame identification module (PEFIM), an aggregator, and a classifier.The PEFIM module identifies regions that characterise signs of UC, such as ulcers or erosions. The aggregator and classifier modules learn to combine the information at the video level. We focus on the clinical utility of the EndoUC model in two ways: (i) having task-specific classifiers focusing on clinical decision boundaries of UC and (ii) enhancing scoring granularity of UC severity prediction using ordinal regression. Results We developed an application to enable experts to review lengthy endoscopy footage in clinical trials efficiently. The app allows seamless navigation to key regions identified by the EndoUC and transparently reveals the basis of the AI model’s assessments. We conducted 5-fold cross-validation on the clinical trial data and evaluated the model on a held-out test set with 87 videos. The remission (mMS 0, 1 vs 2, 3) model performs with an AUC-ROC of 0.85, F1 of 0.87, normal/inactive and severe models with AUC-ROC of 0.81and 0.83 respectively. The continuous grading model has an MSE of 0.42 and MAE of 0.40. Conclusion We developed an AI-assisted reading tool for UC endoscopic severity assessment to help accelerate clinical trails through rapid evaluation of patient eligibility, detection of treatment response grounded by key region evidence, and potential for continuous grading of endoscopic changes.
Read full abstract