Abstract

PurposeLocalization and screening of the target tissue is a main prerequisite of numerous medical procedures, including capsule endoscopy, colonoscopy and histology. Convolutional Neural Networks (CNNs), by stacking convolutional, down-sampling, and up-sampling operators in an encoder-decoder fashion, were the de-facto standard and has shown great promise in recent years. The main deficiency of these models is their local convolutional operators, which degrade accuracy, especially for targets with long-range dependencies. While CNNs excel at local feature extraction, Transformers are known for their ability to capture long-range dependencies. Also, CNN-Transformer models employ complex attention mechanisms for fusion that could increase model complexity and the potential for overfitting and underfitting in many datasets. MethodsIn this paper, we propose an efficient context-aware CNN-Transformer fusion mechanism based on Semi-supervised Spatial and Global Attention mechanism (SSG-Att). Our model is designed to combine the strengths of both models and overcome their limitations. High-level features that are extracted from the two parallel branches are combined and fused using the proposed SSG-Att mechanism. A hybrid loss function is also employed, which is better adapted to the introduced fusion system. ResultsWe evaluated the performance of our proposed model on the Kvasir-SEG, a polyp segmentation and detection dataset, and the Gland segmentation dataset. The experimental results confirmed that the improvements yield a top-performing yet efficient deep fused CNN-Transformer architecture. The proposed model outperformed the best-reported accuracies, achieving improved dice scores of 92.11 ± 1.10 % and 91.16 ± 0.81 on the Kvasir-SEG and GlaS datasets, respectively. ConclusionWe concluded that the proposed context-aware fusion mechanism has the potential to be used in screening and localization applications in a more reliable and accurate operation compared to other state-of-the-art methods.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call