UNet has achieved great success in medical image segmentation. However, due to the inherent locality of convolution operations, UNet is deficient in capturing global features and long-range dependencies of polyps, resulting in less accurate polyp recognition for complex morphologies and backgrounds. Transformers, with their sequential operations, are better at perceiving global features but lack low-level details, leading to limited localization ability. If the advantages of both architectures can be effectively combined, the accuracy of polyp segmentation can be further improved. In this paper, we propose an attention and convolution-augmented UNet-Transformer Network (ACU-TransNet) for polyp segmentation. This network is composed of the comprehensive attention UNet and the Transformer head, sequentially connected by the bridge layer. On the one hand, the comprehensive attention UNet enhances specific feature extraction through deformable convolution and channel attention in the first layer of the encoder and achieves more accurate shape extraction through spatial attention and channel attention in the decoder. On the other hand, the Transformer head supplements fine-grained information through convolutional attention and acquires hierarchical global characteristics from the feature maps. mcU-TransNet could comprehensively learn dataset features and enhance colonoscopy interpretability for polyp detection. Experimental results on the CVC-ClinicDB and Kvasir-SEG datasets demonstrate that mcU-TransNet outperforms existing state-of-the-art methods, showcasing its robustness.
Read full abstract