Abstract

Objective. Deep convolutional neural networks (CNNs) have been widely applied in medical image analysis and achieved satisfactory performances. While most CNN-based methods exhibit strong feature representation capabilities, they face challenges in encoding long-range interaction information due to the limited receptive fields. Recently, the Transformer has been proposed to alleviate this issue, but its cost is greatly enlarging the model size, which may inhibit its promotion. Approach. To take strong long-range interaction modeling ability and small model size into account simultaneously, we propose a Transformer-like block-based U-shaped network for medical image segmentation, dubbed as SCA-Former. Furthermore, we propose a novel stream-cross attention (SCA) module to enforce the network to focus on finding a balance between local and global representations by extracting multi-scale and interactive features along spatial and channel dimensions. And SCA can effectively extract channel, multi-scale spatial, and long-range information for a more comprehensive feature representation. Main results. Experimental results demonstrate that SCA-Former outperforms the current state-of-the-art (SOTA) methods on three public datasets, including GLAS, ISIC 2017 and LUNG. Significance. This work exhibits a promising method to enhance the feature representation of convolutional neural networks and improve segmentation performance.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call