Abstract Background Limited endoscopy of the rectosigmoid colon is sometimes used to assess endpoints in clinical trials for ulcerative colitis (UC). However, there is inherent inter- and intra-rater variability in endoscopic assessment of inflammation, and regulatory agencies have suggested that full colonoscopy is preferred1,2,3. We used a machine learning (ML) model to compare the degree of inflammation of full-length endoscopy videos to that seen in the distal colon and rectum. Methods We used a previously developed ML model that predicts the endoscopy subscore4, a component of the modified Mayo Score, and applied it to 49 full-length endoscopy videos randomly selected and stratified by endoscopic severity from the Phase 3 induction trial for mirikizumab in UC (NCT03518086). Each video was preprocessed by a human reviewer to identify the point of maximal extent and the end of the procedure. The withdrawal portion of the videos were divided into 15, 30, or 60 second clips (a series of smaller segments of video), and the ML model generated an endoscopy subscore for each clip (clip-level endoscopy subscore) and for the entire video (video-level endoscopy subscore). We assigned the final two clips during withdrawal as an assessment of the sigmoid and rectum. Therefore, in this analysis, we compared the ML endoscopy subscore grade of the final two clips to the video-level endoscopy subscore. Kappa statistics of variability between scores were performed. Results There was substantial agreement between the video-level endoscopy subscore and that of the last 2 clips (Table 1). For 60-second clips, the agreement rate between the video-level endoscopy subscore and clip 1 was 0.67 and that with clip 2 was 0.76. When comparing the video-level endoscopy subscore to clip 1 or clip 2, the agreement rate was 0.86, and this was greater than the agreement rate when looking at the higher grade of clip 1 or clip 2 (0.73). The 60 second clips had greater agreement rates than 30 second or 15 second clips (Table 1 and 2). The endoscopy subscores of clip 1 or clip 2 had excellent agreement with video-level grades 0-3 (Table 2), and this was best observed with 60 second clips (Table 2A, kappa 0.944). Conclusion The ML model demonstrated excellent agreement between clip-level endoscopy subscore assessments of the distal most portion of the withdrawal videos compared to the video-level endoscopy subscore, and this was best observed with 60 second clips. These findings support the use of distal colon and rectum assessment to determine the grade degree of endoscopic inflammation of the full colonoscopy. We propose further prospective study of the use of this ML model in this setting. References Feagan, B. G., Khanna, R., Sandborn, W. J., Vermeire, S., Reinisch, W., Su, C., . . . & Sands, B. E. (2021). Agreement between local and central reading of endoscopic disease activity in ulcerative colitis: Results from the tofacitinib OCTAVE trials. Alimentary Pharmacology & Therapeutics, 54(11-12), 1442-1453. Hashash, J. G., Yu Ci Ng, F., Farraye, F. A., Wang, Y., Colucci, D. R., Baxi, S., . . . & Melmed, G. Y. (2024). Inter-and intraobserver variability on endoscopic scoring systems in crohn’s disease and ulcerative colitis: A systematic review and meta-analysis. Inflammatory Bowel Diseases, izae051. Food and Drug Administration. Ulcerative Colitis: Developing Drugs for Treatment [Internet]. 2022 [cited 2024 Oct 17]. Available from: https://www.fda.gov/regulatory-information/search-fda-guidance-documents/ulcerative-colitis-developing-drugs-treatment Rubin, D. T., Gottlieb, K., Colombel, J. F., Schott, J. P., Erisson, L., Prucka, B., . . . & McGill, J. (2023). Development of a novel ulcerative colitis Endoscopic Mayo Score prediction model using machine learning. Gastro Hep Advances, 2(7), 935-942.
Read full abstract