Abstract

Medical image segmentation can help doctors accurately identify and locate various structures, tissues, or lesions in the patient's body, providing an important basis for clinical diagnosis. However, the segmentation of medical images is seriously affected by the complexity of the structures in the images, the variety of morphologies, the uneven contrast, and the blurring of the boundaries between the target tissues and the background. Therefore, we propose a Multiscale Gated Axial Reverse Attention Transformer (MSGAT) for medical image segmentation. First, we use the Pyramid Vision Transformer (PVT) as the backbone network to extract the high-level features and introduce the Channel Received Field Block (CRFB) to filter the interfering information in the high-level features, and then we augment and aggregate the high-level features separately. Specifically, on the one hand, the Multiscale Feature Enhancement Module (MSFEM) is used to obtain multiscale contextual information to enhance the positional information of the target, and on the other hand, Multiscale Parallel Partial Decoder (MSPPD) is employed to aggregate the high-level feature maps to generate rough segmentation maps. Finally, the rough segmentation map is used to guide the gated axial reverse attention (GARA) combined with the position information in the enhanced feature map to refine the segmentation boundaries and gradually mine the detailed information of the target. MSGAT is evaluated against state-of-the-art methods on skin cancer (ISIC2016, PH2), polyp (Kvair), and gland (GlaS) datasets. Results demonstrate MSGAT's superiority, with an average improvement of 6.19% and 9.35% in mIoU and DSC metrics, respectively, compared to existing methods.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call