Superpixels, as essential mid-level image representations, have been widely used in computer vision due to their computational efficiency and redundant compression. Compared with traditional superpixel methods, superpixel algorithms based on deep learning frameworks demonstrate significant advantages in segmentation accuracy. However, existing deep learning-based superpixel algorithms suffer from a loss of details due to convolution and upsampling operations in their encoder–decoder structure, which weakens their semantic detection capabilities. To overcome these limitations, we propose a novel superpixel segmentation network based on a multi-attention hybrid network (MAS-Net). MAS-Net is still based on an efficient symmetric encoder–decoder architecture. First, utilizing residual structure based on a parameter-free attention module at the feature encoding stage enhanced the capture of fine-grained features. Second, adoption of a global semantic fusion self-attention module was used at the feature selection stage to reconstruct the feature map. Finally, fusing the channel with the spatial attention mechanism at the feature-decoding stage was undertaken to obtain superpixel segmentation results with enhanced boundary adherence. Experimental results on real-world image datasets demonstrated that the proposed method achieved competitive results in terms of visual quality and metrics, such as ASA and BR-BP, compared with the state-of-the-art approaches.
Read full abstract