Abstract
Video capsule endoscopy (VCE) is a revolutionary technology for the early diagnosis of gastric disorders. However, owing to the high redundancy and subtle manifestation of anomalies among thousands of frames, the manual construal of VCE videos requires considerable patience, focus, and time. The automatic analysis of these videos using computational methods is a challenge as the capsule is untamed in motion and captures frames inaptly. Several machine learning (ML) methods, including recent deep convolutional neural networks approaches, have been adopted after evaluating their potential of improving the VCE analysis. However, the clinical impact of these methods is yet to be investigated. This survey aimed to highlight the gaps between existing ML-based research methodologies and clinically significant rules recently established by gastroenterologists based on VCE. A framework for interpreting raw frames into contextually relevant frame-level findings and subsequently merging these findings with meta-data to obtain a disease-level diagnosis was formulated. Frame-level findings can be more intelligible for discriminative learning when organized in a taxonomical hierarchy. The proposed taxonomical hierarchy, which is formulated based on pathological and visual similarities, may yield better classification metrics by setting inference classes at a higher level than training classes. Mapping from the frame level to the disease level was structured in the form of a graph based on clinical relevance inspired by the recent international consensus developed by domain experts. Furthermore, existing methods for VCE summarization, classification, segmentation, detection, and localization were critically evaluated and compared based on aspects deemed significant by clinicians. Numerous studies pertain to single anomaly detection instead of a pragmatic approach in a clinical setting. The challenges and opportunities associated with VCE analysis were delineated. A focus on maximizing the discriminative power of features corresponding to various subtle lesions and anomalies may help cope with the diverse and mimicking nature of different VCE frames. Large multicenter datasets must be created to cope with data sparsity, bias, and class imbalance. Explainability, reliability, traceability, and transparency are important for an ML-based diagnostics system in a VCE. Existing ethical and legal bindings narrow the scope of possibilities where ML can potentially be leveraged in healthcare. Despite these limitations, ML based video capsule endoscopy will revolutionize clinical practice, aiding clinicians in rapid and accurate diagnosis.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.