Abstract Background Endoscopic assessment of inflammation in ulcerative colitis (UC) is a therapeutic endpoint in clinical trials1 but can be limited by inter- and intra-observer variability2 and lack of sensitivity to detect the degree of inflammation throughout the colon3. We used a machine learning (ML) model to grade inflammation of full-length endoscopy videos at the clip-level (a series of smaller segments of video) in order to evaluate the uniformity of mucosal inflammation. Methods We used a previously developed ML model that predicts the endoscopy subscore4, a component of the modified Mayo Score, and applied it to 49 full-length endoscopy videos randomly selected and stratified by endoscopic severity from the Phase 3 induction trial for mirikizumab in UC (NCT03518086). A human reviewer identified the point of maximal extent and the end of the procedure to isolate the withdrawal portion of the video which was then divided into 15, 30, or 60 second clips, and the model generated an endoscopy subscore for each clip (clip-level endoscopy subscore) and for the entire video (video-level endoscopy subscore). Variability of inflammation between clips were calculated. Results Assessment of the endoscopy subscore on individual video clips during withdrawal demonstrated a patchy distribution of inflammation severity, and this was consistent regardless of the video-level endoscopic severity (Fig. 1). Notably, the segment with the most severe inflammation did not always occur proximally (Fig. 1). The variability of inflammation was greater in 15-second (Fig. 1A) than 30-second (Fig. 1B) or 60-second (Fig. 1C) clips. Video-level endoscopy subscores of 1-3 did contain the spectrum of clip-level endoscopy subscore grades of 0-3, but the greatest proportion of clips correlated with the video-level endoscopy subscore (Fig. 2). Conclusion Using a novel ML model to determine video-level and clip-level endoscopy subscores using colonoscopy videos in patients with moderate-to-severe UC, we identify heterogeneity of inflammatory activity, and this was best observed with clips at 15-second intervals. These findings provide insight into inter- and intra-rater variability of endoscopy subscore assessments by human readers and highlight relevant data that are not captured by a single video-level score. Further study is needed to optimize assessment of inflammation in this setting. References Food and Drug Administration. Ulcerative Colitis: Developing Drugs for Treatment [Internet]. 2022 [cited 2024 Oct 17]. Available from: https://www.fda.gov/regulatory-information/search-fda-guidance-documents/ulcerative-colitis-developing-drugs-treatment Hashash, J. G., Yu Ci Ng, F., Farraye, F. A., Wang, Y., Colucci, D. R., Baxi, S., . . . & Melmed, G. Y. (2024). Inter-and intraobserver variability on endoscopic scoring systems in crohn’s disease and ulcerative colitis: A systematic review and meta-analysis. Inflammatory Bowel Diseases, izae051. Vuyyuru, S. K., Ma, C., Nguyen, T. M., Zou, G., Peyrin-Biroulet, L., Danese, S., . . . & Jairath, V. (2024). Differential efficacy of medical therapies for ulcerative colitis according to disease extent: Patient-level analysis from multiple randomized controlled trials. EClinicalMedicine, 72. Rubin, D. T., Gottlieb, K., Colombel, J. F., Schott, J. P., Erisson, L., Prucka, B., . . . & McGill, J. (2023). Development of a novel ulcerative colitis endoscopic mayo score prediction model using machine learning. Gastro Hep Advances, 2(7), 935-942.
Read full abstract