Abstract

Most existing RGB-D salient object detection (SOD) methods are primarily focusing on cross-modal and cross-level saliency fusion, which has been proved to be efficient and effective. However, these methods still have a critical limitation, i.e., their fusion patterns - typically the combination of selective characteristics and its variations, are too highly dependent on the network's non-linear adaptability. In such methods, the balances between RGB and D (Depth) are formulated individually considering the intermediate feature slices, but the relation at the modality level may not be learned properly. The optimal RGB-D combinations differ depending on the RGB-D scenarios, and the exact complementary status is frequently determined by multiple modality-level factors, such as D quality, the complexity of the RGB scene, and degree of harmony between them. Therefore, given the existing approaches, it may be difficult for them to achieve further performance breakthroughs, as their methodologies belong to some methods that are somewhat less modality sensitive. To conquer this problem, this paper presents the Modality-aware Decoder (MaD). The critical technical innovations include a series of feature embedding, modality reasoning, and feature back-projecting and collecting strategies, all of which upgrade the widely-used multi-scale and multi-level decoding process to be modality-aware. Our MaD achieves competitive performance over other state-of-the-art (SOTA) models without using any fancy tricks in the decoder's design. Codes and results will be publicly available at https://github.com/MengkeSong/MaD.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.