Depth maps can provide supplementary information for salient object detection (SOD) and perform better in handling complex scenes. Most existing RGB-D methods only utilize deep cues at the same level, and few methods focus on the information linkage between cross-level features. In this study, we propose a Progressive Cross-level Fusion Network (PCF-Net). It ensures the cross-flow of cross-level features by gradually exploring deeper features, which promotes the interaction and fusion of information between different-level features. First, we designed a Cross-Level Guide Cross-Modal Fusion Module (CGCF) that utilizes the spatial information of upper-level features to suppress modal feature noise and to guide lower-level features for cross-modal feature fusion. Next, the proposed Semantic Enhancement Module (SEM) and Local Enhancement Module (LEM) are used to further introduce deeper features, enhance the high-level semantic information and low-level structural information of cross-modal features, and use self-modality attention refinement to improve the enhancement effect. Finally, the multi-scale aggregation decoder mines enhanced feature information in multi-scale spaces and effectively integrates cross-scale features. In this study, we conducted numerous experiments to demonstrate that the proposed PCF-Net outperforms 16 of the most advanced methods on six popular RGB-D SOD datasets.
Read full abstract