Abstract

Introduction: Adequate withdrawal phase (WP) time is a quality metric for colonoscopy exams. When analyzed frame by frame manually by human experts, 20- 30% frames in WP videos, do not contain interpretable data due to poor visual quality. Hence “effective WP time” is shorter than total WP time. Manual interpretation of thousands of images from WP videos is not practical. However CNN (Figure 1), a type of artificial intelligence application for medical imaging, can be pretrained on a large set of labeled natural images and fine-tuned for this purpose. Aims 1) Can an endoscopist's image interpretation of WP be used to train CNN to detect poor quality visual data? 2) Accuracy of CNN to detect poor quality visual data.Figure: The deep Convolutional Neural Network (CNN) Architecture.Data flow is from left to right: A colonoscopy image is fed into the AlextNet CNN architecture and sequentially wrapped into a probability distribution over poor and non-poor colonoscopy images. The CNN is pre-trained on the ImageNet dataset and fine-tuned on “training cohort” colonoscopy dataset.Methods: 10 WP videos of screening colonoscopy exams were included in the “training cohort”. Every 200th frame from each video was analyzed by one expert endoscopist to represent the entire WP. Each frame was rated as “poor = < 50% interpretable” (Figure 2) or “adequate=> 50% interpretable” (Figure 3). Two of these videos were used to train two trainee endoscopist after which they analyzed the other 8 videos. Kappa values were obtained for agreement statistics. This cohort was also used to fine tune CNN. Subsequently, 5 endoscopists (2 experts+ 3 trainees) then read a different set of 10 videos in the “validation cohort” all blinded to each other. CNN output was obtained for this cohort as a binary outcome of poor or adeqaute image. Sensitivity (Sens) specificity (Spec) predictive values (PPV, NPV) and kappa values were obtained for CNN. Frames that were marked poor by 4 or more endoscopists (out of 5) were considered gold standard “poor” for statistical analysis.Figure: Representative image of a poor frame where <50% of the visual data is interpretable.Figure: Representative image of an “adequate” frame where >50% of the visual data is interpretable.Results: For training cohort, kappa value for agreement on poor images between the expert and trainee was 0.65 (Substantial agreement). In the validation cohort, kappa score for endoscopist and CNN for poor images was 0.76 (excellent agreement). Proportion of data that was of poor quality as per gold standard was 23% (234/1005). Sens, spec, PPV and NPV of CNN was 78%, 95%, 85% and 93% respectively. Conclusion: It is feasible that machine learning, such as CNN, can be employed to determine effective WP time with acceptable accuracy. For endoscopist with low ADR or GI fellows in training, this can be one of the metrics to target, for example, by improving hand eye coordination or other training methods, with potential real time feedback.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call