MSSA‐Net: A novel multi‐scale feature fusion and global self‐attention network for lesion segmentation

Zhaohong Huang,Xiangchen Zhang,Guowei Zhang,Guorong Cai

doi:10.1002/cpe.7060

Abstract

AbstractIn medical image segmentation tasks, it is typical to adopt convolutional neural networks with a serial encoder‐decoder structure. However, mainstream networks cannot simultaneously achieve sufficient extraction of global features and the fusion of multi‐scale information, which may lead to unpromising results for the segmentation of pathological images. Therefore, this article proposed a novel multi‐scale feature fusion and global self‐attention network (MSSA‐Net) for medical image segmentation. Specifically, we designed a parallel double‐encoder network with a multi‐scale feature fusion encoder (MS‐Encoder) and a self‐attention encoder (SA‐Encoder). The SA‐Encoder introduces the transformer's global self‐attention mechanism to extract global features, and the MS‐Encoder adopts atrous spatial pyramid pooling (ASPP) to realize multi‐scale fusion. We have evaluated the proposed MSSA‐Net using three medical segmentation datasets, covering various imaging modalities such as colonoscopy and magnetic resonance imaging. Experiments on the CVC‐ClinicDC, the 2015 MICCAI subchallenge on automatic polyp detection dataset, and anatomical tracings of lesions after stroke (ATLAS) show that our MSSA‐Net outperforms mainstream methods such as DoubleU‐Net and TransUNet. Moreover, MSSA‐Net can predict more accurate segmentation masks, especially in the case of ATLAS, which has challenging images such as multiple shadow areas and discrete lesions.

Full Text