Background: Probe-based confocal laser endomicroscopy (pCLE) enables dynamic imaging of the gastrointestinal epithelium In Vivo during ongoing endoscopy and, as of today, relies on the endoscopist for image understanding. The subjective nature of pCLE video semantics suggests the need for a standardized and more automated method for image sequence interpretation. Aims: To support the diagnosis of a newly acquired pCLE video, we aim at retrieving from a training database videos that have a similar appearance to the video of interest and that have been previously diagnosed by expert physicians with confirmed histology. As a model system, we used distinction of adenomatous and hyperplastic colorectal polyps. Methods: 68 patients underwent colonoscopy with pCLE for fluorescein-aided imaging of suspicious colonic polyps before their removal. The resulting database is composed of 121 videos (36 non-neoplastic, 85 neoplastic) and 499 edited video sub-sequences (231 non-neoplastic, 268 neoplastic) annotated by clinical experts with a pathological diagnosis. To quantify the relevance of video retrieval, we performed an unbiased classification with leave-one-patient-out cross-validation, based on the voting of the k most similar videos. The Bag-of-Visual-Words method from computer vision extracts local continuous image features and clusters them into a finite number of visual words to build an efficient image signature. In order to retrieve videos and not only isolated images, we revisited this method and analyzed the impact of including spatial overlap between time-related images. We first used the results of a video-mosaicing technique to weight the contribution of each local image region to its visual word. Then, we computed the video signatures with a histogram summation technique, which reduces both retrieval runtime and training memory. Results: Video classification results show that our method achieves, when using the votes of the k=9 most similar videos, a sensitivity of 97.7% and a specificity of 86.1% for a resulting accuracy of 94.2%. When compared to using the still images independently, using video data improves the results in a statistically significant manner (McNemar's test: p-value=0.021 when using the votes of the k=3 most similar videos). Moreover, fewer similar videos are necessary to classify the query at a given accuracy, which is clinically relevant for the physician. Conclusion: Our method using the results of video-mosaicing for content-based video retrieval appears to be highly accurate for pCLE videos. It may provide the endoscopist with diagnostic decision support and avoid unnecessary polypectomy of non-neoplastic lesions.
Read full abstract