Enhanced multi-scale networks for semantic segmentation

Tianping Li,Zhaotong Cui,Yu Han,Guanxing Li,Meng Li,Dongmei Wei

doi:10.1007/s40747-023-01279-x

Tianping Li, Zhaotong Cui + Show 4 more

Open Access

PDF Available

https://doi.org/10.1007/s40747-023-01279-x

Copy DOI

Export

Save

Cite

Journal: Complex & Intelligent Systems	Publication Date: Dec 4, 2023
Citations: 5	License type: CC BY 4.0

Affiliation: Shandong Normal University

Abstract
Full-Text PDF
Similar Papers

Abstract

Listen

Multi-scale representation provides an effective answer to the scale variation of objects and entities in semantic segmentation. The ability to capture long-range pixel dependency facilitates semantic segmentation. In addition, semantic segmentation necessitates the effective use of pixel-to-pixel similarity in the channel direction to enhance pixel areas. By reviewing the characteristics of earlier successful segmentation models, we discover a number of crucial elements that enhance segmentation model performance, including a robust encoder structure, multi-scale interactions, attention mechanisms, and a robust decoder structure. The attention mechanism of the asymmetric non-local neural network (ANNet) is merged with multi-scale pyramidal modules to accelerate model segmentation while maintaining high accuracy. However, ANNet does not account for the similarity between pixels in the feature map channel direction, making the segmentation accuracy unsatisfactory. As a result, we propose EMSNet, a straightforward convolutional network architecture for semantic segmentation that consists of Integration of enhanced regional module (IERM) and Multi-scale convolution module (MSCM). The IERM module generates weights using four or five-stage feature maps, then fuses the input features with the weights and uses more computation. The similarity of the channel direction feature graphs is also calculated using ANNet’s auxiliary loss function. The MSCM module can more accurately describe the interactions between various channels, capture the interdependencies between feature pixels, and capture the multi-scale context. Experiments prove that we perform well in tests using the benchmark dataset. On Cityscapes test data, we get 82.2% segmentation accuracy. The mIoU in the ADE20k and Pascal VOC datasets are, respectively, 45.58% and 85.46%.

Full Text