Wireless capsule endoscopy (WCE) has revolutionized the field of gastrointestinal examinations, being MedtronicTM WCE one of the most used in clinics. In those WCE videos, medical experts use RAPID READERTM tool to annotate findings in videos. However, the frame annotations are not available in an open format and, when exported, they have different resolutions and some annotated artefacts that make difficult their localization in the original videos. This difficult the use of WCE medical experts’ annotations in the research of new computed-aid diagnostic (CAD) methods. In this paper, we propose a methodology to compare image similarities and evaluate it in a private MedtronicTM WCE SB3 video dataset to automatically identify the annotated frames in the videos. We used state-of-the-art pre-trained convolutional neural network (CNN) models, including MobileNet, InceptionResNetv2, ResNet50v2, VGG19, VGG16, ResNet101v2, ResNet152v2, and DenseNet121, as frame features extractors and compared them with the Euclidean distance. We evaluated the methodology performance on a private dataset consisting of 100 WCE videos, totalling 905 frames. The experimental results showed promising performance. The MobileNet model achieved an accuracy of 94% for identifying the first match, while the top 5, top 10, and top 20 matches were identified with accuracies of 94%, 94%, and 98%, respectively. The VGG16 and ResNet50v2 models also demonstrated strong performance, achieving accuracies ranging from 88% to 93% for various match positions. These results highlight the effectiveness of our proposed methodology in localizing target frames and even identifying similar frames very use useful for training data-driven models in CAD research. The code utilized in this experiment is available on the Github†.
Read full abstract