Abstract

Abstract Background Endoscopic assessment is a core component of disease severity in ulcerative colitis (UC), but subjectivity threatens accuracy and reproducibility. We aimed to develop and test a fully-automated video analysis system for endoscopic disease severity in UC. Methods A developmental dataset of local high-resolution UC colonoscopy videos were generated with Mayo endoscopic scores (MES) provided by experienced local reviewers. Videos were converted into still images stacks and annotated for both sufficient image quality for scoring (informativeness) and MES grade (e.g. Mayo 0,1,2,3). Convolutional neural networks (CNNs) were used to train models to predict still image informativeness and disease severity grading with 5-fold cross-validation. Whole video MES models were developed by matching reviewer MES scores with the proportion of still image predicted scores within each video using a template matching grid search. The automated whole video MES workflow was tested in a separate endoscopic video set from an international multicenter UC clinical trial (LYC-30937-EC). Cohen’s kappa coefficient with quadratic weighting was used for agreement assessment. Results The developmental set included 51 high-resolution videos (Mayo 2,3 41.2%), with the multicenter clinical trial containing 264 videos (Mayo 2,3 83.7%, p < .0001) from 157 subjects. In 34,810 frames, the still image informative classifier had excellent performance with an AUC of 0.961, sensitivity of 0.902, and specificity of 0.870. In high-resolution videos, agreement between reviewers and fully-automated MES was very good with correct prediction of exact MES in 78% (40/51,κ=0.84, 95% CI 0.75–0.92) of videos (Figure 1). In external clinical trial videos where dual central review was performed, reviewers agreed on exact MES in 82.8% (140/169) of videos (κ = 0.78, 95% CI 0.71–0.86). Automated MES grading of the clinical trial videos (often low resolution) correctly distinguished Mayo 0,1 vs. 2,3 in 83.7% (221/264) of videos. Agreement between automated and central reviewer on exact MES occurred in 57.1% of videos (κ=0.59, 95% CI 0.46–0.71), but improved to 69.5% when accounting for human reviewer disagreement. Automated MES was within 1-level of central scores in 93.5% of videos (247/264). Ordinal characteristics are shown for the automated process, predicting progressively increasing disease severity. TPR, true positive rate; FPR, false-positive rate. Conclusion Though premature for immediate deployment, these early results support the feasibility for artificial intelligence to approach expert-level endoscopic disease grading in UC.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.