Medical image segmentation methods using the attention mechanism have achieved great results. However, these attention mechanisms primarily focus on the evaluation of explicit response values, ignoring the important information provided by the implicit energy, i.e., the grayscale change frequency contained in the data. The deficiency in the attention mechanism mentioned above limits its ability to aggregate valid features in medical image segmentation tasks. We propose an efficient network for medical image segmentation, named TransMIS. This network innovatively designs an implicit attention mechanism to interoperate with the explicit attention mechanism and fuses multi-level features with long-range and local dependencies to construct a robust feature representation, effectively improving its segmentation performance. Specifically, TransMIS introduces two new components: (1) A Multi-Scale Compression Self-Attention Mechanism (MSCS) that uses the sparse representation to replace complex feature tensors for matrix multiplication, which can effectively enhance feature representation while reducing the amount of computation. (2) An Explicit–Implicit Channel Mixer Module (CMM) that combines implicit energy quantification and explicit response value evaluation methods, which can bridge the discrepancies between different semantic information and fuse multi-level features effectively. With extensive experiments on multiple benchmark datasets, it shows that TransMIS outperforms current state-of-the-art methods for medical image segmentation. We also conduct rigorous ablation experiments to verify the effectiveness of the proposed components.
Read full abstract