Abstract
Transformer complements convolutional neural network (CNN) has achieved better performance than improved CNN-based methods. Specially, Transformer is utilized to be combined with U-shaped structure, skip-connections, encoder, and even them all together. However, the intermediate supervision network based on the coarse-to-fine strategy has not been combined with Transformer to improve the generalization of CNN-based methods. In this paper, we propose Swin-PANet, which is applying a window-based self-attention mechanism by Swin Transformer in the intermediate supervision network, called prior attention network. A new enhanced attention block based on CCA is also proposed to aggregate the features from skip-connections and prior attention network, and further refine details of boundaries. Swin-PANet can address the dilemma that traditional Transformer network has poor interpretability in the process of attention calculation and Swin-PANet can insert its attention predictions into prior attention network for intermediate supervision learning which is humanly interpretable and controllable. Hence, the intermediate supervision network assisted by Swin Transformer provides better attention learning and interpretability in network for accurate and automatic medical image segmentation. The experimental results evaluate the effectiveness of Swin-PANet which outperforms state-of-the-art methods in some famous medical segmentation tasks including cell and skin lesion segmentation.
Published Version (Free)
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have