Traditional pyramid pooling modules have shown effective improvements in semantic segmentation tasks by capturing multi-scale feature information. However, their limitations arise from the shallow structure, which fails to fully extract contextual information, and the fused multi-scale feature information lacks distinctiveness, resulting in issues with the final segmentation discriminability. To address these issues, we proposes an effective solution called FCPFNet, which is based on global contextual prior for deep feature extraction of detailed information. Specifically, we introduce a novel deep feature aggregation module to extract semantic information from the output feature map of each layer through a deep aggregation of context information module, and expands the effective perception range. Additionally, we propose an Efficient Pyramid Pooling Module (EPPM) to capture distinctive features through communicating information between different sub-features and performs multi-scale fusion, which is integrated as a branch within the network to complement the information loss resulting from downsampling operations. Furthermore, in order to ensure the richness of image detail feature information and maintain a large receptive field to obtain more contextual information, EPPM concatenates the input feature map and the output feature map of the pyramid pooling module to acquire more comprehensive global contextual information. It has been demonstrated by experiment that the method described in this article achieves competitive performance on the challenging scene segmentation datasets Pascal VOC 2012, Cityscapes and Coco-Stuff, with MIOU of 81.0%, 78.8% and 40.1%, respectively.
Read full abstract