Abstract

Colorectal cancer (CRC) has become one of the most frequent cancers in the world. To prevent CRC, proper polyp localization in endoscopy images plays a vital role for detecting and removing colorectal polyps. Most polyp segmentation methods use convolutional neural networks (CNN) as their backbone, and have achieved promising results to effectively assist clinicians in their diagnosis. However, those CNN-based approaches have limitations in modeling accurate location and shape of polyps, due to the intrinsic locality property of convolutional operations. To address these limitations, this study proposes a novel network, namely Att-PVT, that combines CNN and Pyramid Vision Transformer (PVT) together for poly segmentation. The main challenge lies in maintaining long-range semantic information without sacrificing low-level features. Att-PVT applies multidimensional information extraction (MIE) to generate refined feature maps extracted from PVT for better feature representation. Cascaded context integration (CCI) is designed to adaptively aggregating the three highest layers of refined polyp features for learning semantic and location information. To accurately segment colorectal polyps, Att-PVT introduces multilevel feature fusion (MFF) module that explores the boundary information in the shallower layer based on the global map. The proposed workflow has undergone comparative experiments on three public datasets, namely Kvasir, ColonDB, and ETIS. The results show that the proposed approach achieves impressive mDice scores of 0.926, 0.817, and 0.794 for polyp segmentation tasks on these datasets, surpassing other state-of-the-art methods. This indicates the superior generalization and scalability of the proposed approach.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call