Abstract

RGB-D salient object detection aims to integrate multimodal feature information for accurate salient region localization. Despite the development of several RGB-D salient object detection models, existing methods face challenges in effectively fusing RGB with Depth features to exploit their complementary aspects. To address this challenge, this study introduces MFUR-Net, a network based on multimodal feature fusion and unimodal feature refinement. The contributions of this study are primarily threefold: First, a multimodal multilevel feature fusion module is proposed at the encoder stage to integrate multimodal and multilevel features, generating enhanced RGB-D features; Second, a multi-input feature aggregation module at the decoder stage is introduced, which incorporates the RGB and Depth feature streams into the RGB-D feature streams so that they collaborate with the RGB-D features to learn more discriminative information related to the salient object; Third, a unimodal saliency feature refinement module is introduced to refine saliency feature information across modalities and eliminate redundancy before the integration of feature streams into the decoder; With the gradual refinement of saliency features, MFUR-Net achieves accurate saliency map prediction at the decoder stage. This method has been validated through extensive experiments on seven recognized datasets, demonstrating significant advantages over existing state-of-the-art techniques in key performance metrics. The source code can be found in https://github.com/wangwei678/MFUR-Net.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call