Abstract
Most existing RGB-D salient object detection (SOD) methods extract features of both modalities in parallel or adopt depth features as supplementary information for unidirectional interaction from depth modality to RGB modality in the encoder stage. These methods ignore the influence of low-quality depth maps, and there is still room for improvement in effectively fusing RGB features and depth features. To address the above problems, this paper proposes a Feature Interaction Network (FINet), which performs bi-directional interaction through feature interaction module (FIM) in the encoder stage. The feature interaction module is divided into two parts: depth enhancement module (DEM) filters the noise in the depth features through the attention mechanism; and cross enhancement module (CEM) effectively interacts RGB features and depth features. In addition, this paper proposes a two-stage cross-modal fusion strategy: high-level fusion adopts the semantic information of high level for coarse localization of salient regions, and low-level fusion makes full use of the detailed information of low level through boundary fusion, and then we progressively refine high-level and low-level cross-modal features to obtain the final saliency prediction map. Extensive experiments show that the proposed model achieves better performance than eight state-of-the-art models on five standard datasets.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.