Abstract
To address the issues of poor segmentation accuracy and insensitivity to details in semantic segmentation, this paper proposes a novel image semantic segmentation framework SEFANet. Specifically, SEFANet adopts encoder-decoder structure, and incorporates a novel perceptual enhancement mechanism called Multi-scale Spatial Integration Module (MSIM) at the encoder. MSIM is based on group convolution to boost spatial semantic features and refine spatial-gradient semantic features within the multi-scale structure. This module enhances feature extraction across different network levels, leading to improved edge detection and segmentation abilities. In the decoder, SEFANet introduces a pixel-level Interleaved Feature Alignment Module (IFAM), which leverages rich semantic information in low-dimensional features and the strategy of Semantic Offset Field. Meanwhile, IFAM warps the high-dimensional feature map into low-dimensional features, completing the calibration process through convolution operations. Experimental results on the Pascal VOC2012 val dataset and the Cityscapes val dataset confirm the effectiveness and generalization of the proposed semantic segmentation. Additionally, the results further demonstrate that SEFANet improves the poor segmentation accuracy and insensitivity to details, and achieves a competitive performance compared with other semantic segmentation methods.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.