Abstract

Medical image segmentation aims at recognizing the object of interest from surrounding tissues and structures, which is essential for the reliable diagnosis and morphological analysis of specific lesions. Automatic medical image segmentation has been significantly boosted by deep Convolutional Neural Networks (CNNs). However, CNNs usually fail to model long-range interactions due to the intrinsic locality of convolutional operations, which limits the segmentation performance. Recently, Transformer has been successfully applied in various computer visions, which leverages the self-attention mechanism for modelling long-range interactions to capture global information. Nevertheless, self-attention suffers from lacks of spatial locality and efficient computation. To address these issues, in this work, we develop a new sparse medical Transformer (SMTF) with multiscale contextual fusion for medical image segmentation. The proposed model combines convolutional operations and attention mechanisms to form a U-shaped framework to capture both local and global information. Specifically, to reduce the computational cost of traditional Transformer, we design a novel sparse attention module to construct Transformer layers by spherical Locality Sensitive Hashing method. The sparse attention partitions the feature space into different attention buckets, and the attention calculation is conducted only in the individual bucket. The designed sparse Transformer layer further incorporates a bottleneck block to construct the encoder in SMTF. It is worth noting that the proposed sparse Transformer can also aggregate the global feature information in early stages, which enables the model to learn more local and global information by incorporating CNNs at lower layers. Furthermore, we introduce a deep supervision strategy to guide the model to fuse multiscale feature information. It further enables the SMTF to effectively propagate feature information across layers to preserve more input spatial information and mitigate information attenuation. Benefiting from these, it can achieve better segmentation performance while being more robust and efficient. The proposed SMTF is evaluated on multiple medical image segmentation datasets and a clinical nasopharyngeal carcinoma dataset. Extensive experiments have demonstrated its superiority on both qualitative and quantitative evaluations. Code and models are available at https://github.com/qmx717/sparse-attention.git.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call