Abstract

Endoscopic evaluation to reliably grade disease activity, detect complications including cancer and verification of mucosal healing are paramount in the care of patients with ulcerative colitis (UC); but this evaluation is hampered by substantial intra- and interobserver variability. Recently, artificial intelligence methodologies have been proposed to facilitate more objective, reproducible endoscopic assessment. In a first step, we compared how well several deep learning convolutional neural network architectures (CNNs) applied to a diverse subset of 8000 labeled endoscopic still images derived from HyperKvasir, the largest multi-class image and video dataset from the gastrointestinal tract available today. The HyperKvasir dataset includes 110,079 images and 374 videos and could (1) accurately distinguish UC from non-UC pathologies, and (2) inform the Mayo score of endoscopic disease severity. We grouped 851 UC images labeled with a Mayo score of 0–3, into an inactive/mild (236) and moderate/severe (604) dichotomy. Weights were initialized with ImageNet, and Grid Search was used to identify the best hyperparameters using fivefold cross-validation. The best accuracy (87.50%) and Area Under the Curve (AUC) (0.90) was achieved using the DenseNet121 architecture, compared to 72.02% and 0.50 by predicting the majority class (‘no skill’ model). Finally, we used Gradient-weighted Class Activation Maps (Grad-CAM) to improve visual interpretation of the model and take an explainable artificial intelligence approach (XAI).

Highlights

  • Endoscopic evaluation to reliably grade disease activity, detect complications including cancer and verification of mucosal healing are paramount in the care of patients with ulcerative colitis (UC); but this evaluation is hampered by substantial intra- and interobserver variability

  • HyperKvasir is an extension of the Kvasir dataset, collected from the same Bærum Hospital from 2008 to 2016, containing 110,079 images, 10,662 of which are labelled with 23 classes of f­indings[25]

  • We were able to achieve moderate to good performance in mild vs. moderate-to-severe UC on a relatively small public dataset of endoscopy images

Read more

Summary

Introduction

Endoscopic evaluation to reliably grade disease activity, detect complications including cancer and verification of mucosal healing are paramount in the care of patients with ulcerative colitis (UC); but this evaluation is hampered by substantial intra- and interobserver variability. Ulcerative colitis and Crohn’s disease, together referred to as inflammatory bowel disease(s) (IBD), are two chronic systemic inflammatory d­ isorders[1,2]. They result from an inappropriate immune response towards the commensal microbiota in genetically susceptible i­ndividuals[3]. Endoscopic scoring systems for ulcerative colitis are heterogeneous and subjective, with significant inter- and intra-observer variability; and are still not routinely used in clinical practice[12]. The available literature suggests that AI models are capable of being as accurate or superior to human experts at certain medical t­ asks[15–17]

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call