Abstract

The encoder-decoder structure is the basic structure of most semantic segmentation models and is adopted by a large number of segmentation models. How to effectively extract image features and achieve high-precision mapping through the optimal design of encoder and decoder is the key issue of current research. SegFormer designs an encoder with excellent performance, which fully extracts the feature information of different semantic granularity in the image with a large receptive field. Even if a simple fully connected layer decoder is used, excellent segmentation results can also be achieved. However, this simplified decoder does not make full use of the advantages of the SegFormer encoder. Therefore, a decoder structure with dual-path multi-scale feature fusion is designed in this paper, and the decoder is redesigned according to the characteristics of the SegFormer encoder. The decoder adopts a dual-path structure, one path passes the abstract global information layer by layer to the local detail information through the layer-by-layer upsampling fusion module (LFM), and gradually upsamples the feature maps obtained from the encoder, and then use the channel fusion module to learn the importance of different channels in the deep abstract semantic feature map and the shallow local detail feature map, and perform dynamic fusion to obtain a feature map containing both abstract semantic information and local details. The other path takes advantage of the large receptive field of the feature map output by the SegFormer encoder, and uses the weighted hybrid multi-scale feature extraction module (WMF) to extract multi-scale features containing global semantics from the deep semantic feature map finally output by the encoder. Finally, the Deep Feature Fusion Module (DFM) is used to fuse the outputs of the first two modules, fully mining the multi-scale global information in the encoder, and obtaine the feature maps with rich semantic information, which effectively improves the algorithm model performance.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call