Efficient cross-information fusion decoder for semantic segmentation

Songyang Zhang,Ge Ren,Xiaoxi Zeng,Liang Zhang,Kailun Du,Gege Liu,Hong Lin

doi:10.1016/j.cviu.2023.103918

Abstract

For fine-scale prediction tasks such as semantic segmentation, existing segmentation models cannot support detailed segmentation due to the difficulty of assigning deep feature semantics generated by the encoder to shallow features, thus making the segmentation of details ambiguous in semantic segmentation scenarios. In addition, high-precision models often require large quantities of computational resources. To solve the above problems, we design an efficient cross-information fusion decoder (ECFD). In the ECFD, we design a cross-information fusion block (CFB), and contextual information is used to assign semantic information to the shallow features in spatial domain, thus facilitating the classification of the details of segmented objects. To reduce the computational effort of the model, we choose the same decoder structure as used by the efficient SenFormer: the feature pyramid structure. Compared with SenFormer, ECFD-Swin-Large reduces the numbers of parameters and floating-point operations by 1/3, and achieves 83.61% and 64.98% of mIoU values for the benchmark datasets Cityscapes and Pascal Context, respectively, outperforming SenFormer, especially for in detailed segmentation. In addition, 69.19% is obtained on BDD100K. The code is publicly available at https://github.com/songyang-xiaobai/ECFD-main.

Full Text