Accurate polyp segmentation from colonoscopy images is important for the immediate diagnosis and effective treatment of colon cancer. While significant progress has been made in the polyps segmentation task, there are various challenges that need to be addressed. Polyps can vary greatly in size and shape, and often has no clear boundary between surrounding tissues. Furthermore, surgical instrument segmentation can aid surgeons with the precise positioning and orientation of the instruments, helping them to plan the next steps in the robot-assisted surgery. The proposed colorectal polyp segmentation Transformer (CPS-Former) uses innovative attention blocks in a network that encodes-decodes features like classic Semantic Segmentation Network (SegNet). However, it has special self-attention modules with small convolutional kernels that efficiently extract information from different feature-channels. Moreover, it is equipped with an effective positional embedding to capture information from a large area of context for long distance interactions. Additionally, a fusion block is embedded for scaling-attention that combines the outputs from the encoder-decoder blocks to enhance the semantic features and reduce the non-semantic ones. Transformer encoder blocks also modified by adding a local feedforward layer and skips connections, and adjust the channel sizes to reduce the model trainable parameters. We evaluate our colorectal polyp segmentation network (CPS-Former) on four colorectal polyp public datasets and one surgical instrument segmentation dataset, which show its superiority over other state-of-the-art polyp segmentation models. Our implementation source code and network weights are available at GitHub: https://github.com/ahmedeqbal/CPS-Former.
Read full abstract