Abstract

There are limitations in the study of transformer-based medical image segmentation networks for token position encoding and decoding of images. The position encoding module cannot encode the position information adequately, and the serial decoder cannot utilize the contextual information efficiently. In this paper, we propose a new CNN-transformer hybrid structure for the medical image segmentation network APT-Net based on the encoder-decoder architecture. The network introduces an adaptive position encoding module for the fusion of position information of a multi-receptive field to provide more adequate position information for the token sequences in the transformer. In addition, the dual-path parallel decoder's basic and guide information paths simultaneously process multiscale feature maps to efficiently utilize contextual information. We conducted extensive experiments and reported a number of important metrics from multiple perspectives on seven datasets containing skin lesions, polyps, and glands. The IoU reached 0.783 and 0.851 on the ISIC2017 and Glas datasets, respectively. To the best of our knowledge, APT-Net achieves state-of-the-art performance on the Glas dataset and polyp segmentation tasks. Ablation experiments validate the effectiveness of the proposed adaptive position encoding module and the dual-path parallel decoder. Comparative experiments with state-of-the-art methods demonstrate the high accuracy and portability of APT-Net.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call