The attention mechanism comes as a new entry point for improving the performance of medical image segmentation. How to reasonably assign weights is a key element of the attention mechanism, and the current popular schemes include the global squeezing and the non-local information interactions using self-attention (SA) operation. However, these approaches over-focus on external features and lack the exploitation of latent features. The global squeezing approach crudely represents the richness of contextual information by the global mean or maximum value, while non-local information interactions focus on the similarity of external features between different regions. Both ignore the fact that the contextual information is presented more in terms of the latent features like the frequency change within the data. To tackle above problems and make proper use of attention mechanisms in medical image segmentation, we propose an external-latent attention collaborative guided image segmentation network, named TransGuider. This network consists of three key components: 1) a latent attention module that uses an improved entropy quantification method to accurately explore and locate the distribution of latent contextual information. 2) an external self-attention module using sparse representation, which can preserve external global contextual information while reducing computational overhead by selecting representative feature description map for SA operation. 3) a multi-attention collaborative module to guide the network to continuously focus on the region of interest, refining the segmentation mask. Our experimental results on several benchmark medical image segmentation datasets show that TransGuider outperforms the state-of-the-art methods, and extensive ablation experiments demonstrate the effectiveness of the proposed components. Our code will be available at https://github.com/chasingone/TransGuider.
Read full abstract