Abstract Background There is considerable inter-user variability in endoscopic and histologic scoring of the severity of ulcerative colitis (UC). The aim of this study was to create and train an artificial intelligence algorithm to predict the severity of UC using endoscopic videos. Methods This prospective cohort study recruited UC patients undergoing flexible sigmoidoscopy or colonoscopy between October 2023 and September 2024. The endoscopic examination was recorded from the splenic flexure to the anus. The videos were segmented into 3 second tubelets, and poor quality videos excluded. Objective disease activity was quantified by an experienced IBD gastroenterologist (by determining individual endoscopic Mayo Scores- eMS), and corresponding biopsies were scored according to the Nancy Histologic Index (NHI) by one of three gastrointestinal histopathologists. The dataset was divided into a training dataset (38 patients, 88% of video data) and a test dataset (6 patients, 12% of data), stratified by disease severity and equipment source. Two video vision transformer models were trained to recognise Mayo and NHI respectively, and their performance in differentiating disease activity from remission, and accuracy in determining endoscopic and histologic severity scores were evaluated. Standard diagnostic accuracy measures were used to calculate the models’ ability to detect and quantitate UC disease activity and included area under the receiver operating characteristic curve (AUROC), binary and multi-class accuracy, sensitivity, specificity, and precision. Results 53 patients were recruited, and after exclusions a total of 44 were included in the final analysis. 30 patients had endoscopic activity and 30 patients had histological activity. The resultant eMS and NHI showed moderate correlation (R=0.72). For eMS, the accuracy of the model was excellent and had an AUROC of 0.98 in identifying disease activity (eMS≥1), with accuracy, sensitivity, specificity, and precision of 92%, 86%, 96%, 93% respectively. Multiclass accuracy was 87%, with a macro F1 score of 0.85. For NHI, the model was highly accurate in detecting histopathologic activity (NHI≥1), with AUROC of 0.95, accuracy, sensitivity, specificity, precision at 89%, 92%, 86%, 85% respectively. However, it was less robust in predicting specific NHI scores, with multiclass accuracy at 69%, and a macro F1 score of 0.41. Conclusion We developed an AI algorithm that is able to reliably predict endoscopic and histologic disease activity for patients with UC, and accurately differentiate those patients from those in remission. This may reduce clinician reliance on biopsies to determine disease activity, and have far-reaching implications in UC disease assessment. References Hashash JG, Yu Ci Ng F, Farraye FA, Wang Y, Colucci DR, Baxi S, Muneer S, Reddan M, Shingru P, Melmed GY. Inter- and Intraobserver Variability on Endoscopic Scoring Systems in Crohn's Disease and Ulcerative Colitis: A Systematic Review and Meta-Analysis. Inflamm Bowel Dis. 2024 Mar 28:izae051. doi: 10.1093/ibd/izae051 Marchal-Bressenot A, Salleron J, Boulagnon-Rombi C, et al. Development and validation of the Nancy histological index for UC. Gut 2017;66:43-49.
Read full abstract