Early and accurate segmentation of medical images can provide valuable information for medical treatment. In recent years, the automatic and accurate segmentation of polyps in colonoscopy images has received extensive attention from the research community of artificial intelligence and computer vision. Many researchers have conducted in-depth research on models based on CNN and Transformer. However, CNN have limited ability to model remote dependencies, which makes it challenging to fully utilize semantic information in images. On the other hand, the complexity of the secondary computation poses a challenge to the transformer. Recently, state-space models (SSMS), such as Mamba, have been recognized as a promising approach. They not only show superior performance in remote interaction, but also maintain linear computational complexity. Inspired by Mamba, we propose DCSS-UNet, where we utilize visual state space (VSS) blocks in VMamba to capture a wide range of contextual information. In the Skip connection phase, we propose Skip Connects Feature Attention modules(SFA) to better communicate information from the encoder. In the decoder stage, we innovatively combined the Temporal Fusion Attention Module(TFAM) to effectively fuse the feature information. In addition, we introduced a custom Loss calculation method, Tversky Loss, for the model to achieve faster convergence and improve segmentation along polyp boundaries. Our model was trained on the Kvasir-SEG and CVC-ClinicDB datasets, and validated on datasets Kvasir-SEG, CVC-ColonDB, CVC-300, and ETIS. The results show that the model achieves good segmentation accuracy and generalization performance with a low number of parameters. We are 6.1% ahead in the Kavirs-SEG dataset and 3.1% ahead in the CVC-ClinicDB dataset compared to VM-UNet.
Read full abstract