AUTOMATIC BLEEDING AND ULCER DETECTION FROM LIMITED QUALITY ANNOTATIONS IN ULCERATIVE COLITIS

Safaa Al-Ali,John Chaussard,Sébastien Li-Thiao-Té,Eric Ogier-Denis,Alice Percy-Du-Sert,Xavier Treton,Hatem Zaag

doi:10.1053/j.gastro.2021.12.045

Abstract

Ulcerative colitis is an idiopathic inflammatory disorder affecting the mucosa of the colon with superficial erosions and ulcers associated with bleeding. Severity assessment using current scoring schemes such as UCEIS and MAYO relies on the subjective interpretation of the physician and fails to take into account the size of the different lesions, their number and their distribution throughout the colon. Automatic lesion detection and grading procedures can enable fine-grained assessment of lesion severity for treatment follow-up. This work aims to learn automatic bleeding and ulcer lesion detectors. As such algorithms may need to be tuned to the characteristics of the equipment used to perform the colonoscopy, we train our detectors on a dataset obtained by the Gastroenterology group at the Bichat and Beaujon hospitals. The patients' videos were anonymous, analyzed after obtaining their consent. The study was approved by the local research study committee. To minimize expert annotation burden, only rectangular annotations are required instead of a precise delineation of lesion boundaries (Figure 1). However, this dataset contains many mislabeled pixels, especially in the corners of the rectangles. This affects the evaluation of the models' performance and our ability to find correct models. Standard sensitivity and specificity cannot be used effectively on this dataset. We propose to evaluate model sensitivity on the annotation level and keep specificity at the pixel level. On the training set, we consider that a model correctly identifies a lesion if it agrees with the expert on a subset of the annotation, and count the detected annotations weighted by their area. For robustness and ease of interpretation, we explore the set of linear classifiers, and propose an efficient sampling scheme that rejects trivial models. This method is evaluated on a database of 10 colonoscopy videos (5 training videos and 5 test videos). In spite of the limited quality of the annotations, we find lesion detectors with a good annotation-level sensitivity (93% specificity / 89% sensitivity for bleeding and 57% specificity / 83% sensitivity for ulcers) and visual performance (see Figure 1). The detector performance is computed reliably. We evaluated sensitivity and specificity on 20 random subsets containing 10% of the images, and obtain similar performance for the same patient for all models (cf Figure 2). However, we observe that the inter-patient performance is variable and that the best models can fail on some patients. In Figure 2, the sensitivity is below 20% is some cases. This suggests that the models are not universal because bleeding and ulcers have a different appearance in patients, and that this should be corrected before automatic detection. Alternatively, models should be adapted to the patient characteristics.

Full Text