Abstract

RGB-D saliency object detection (SOD) primarily segments the most salient objects from a given scene by fusing RGB images and depth maps. Due to the inherent noise in the original depth map, fusion failures may occur, leading to performance bottlenecks. To address this issue, this paper proposes a mutual learning and boosting segmentation network (MLBSNet) for RGB-D saliency object detection, which consists of a deep optimization module (DOM), a semantic alignment module (SAM), a cross-modal integration (CMI) module, and a separate reconstruct decoder (SRD). Specifically, the deep optimization module aims to obtain optimal depth information by learning the similarity between the original and predicted depth maps. To eliminate the uncertainty of single-modal neighboring features and capture the complementary features of multiple modalities, a semantic alignment module and a cross-modal integration module are introduced. Finally, a separate reconstruct decoder based on a multi-source feature integration mechanism is constructed to overcome the accuracy loss caused by segmentation. Through comparative experiments, our method outperforms 13 existing methods on five RGB-D datasets and achieves excellent performance on four evaluation metrics.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.