Abstract
How to effectively exchange and aggregate the information of multiple modalities (e.g. RGB image and depth map) is a big challenge in the RGB-D salient object detection community. To address this problem, in this paper, we propose a cross-modal Hierarchical Interaction Network (HINet), which boosts the salient object detection by excavating the cross-modal feature interaction and progressively multi-level feature fusion. To achieve it, we design two modules: cross-modal information exchange (CIE) module and multi-level information progressively guided fusion (PGF) module. Specifically, the CIE module is proposed to exchange the cross-modal features for learning the shared representations, as well as the beneficial feedback to facilitate the discriminative feature learning of different modalities. Besides, the PGF module is designed to aggregate the hierarchical features progressively with the reverse guidance mechanism, which employs the high-level feature fusion to guide the low-level feature fusion and thus improve the saliency detection performance. Extensive experiments show that our proposed model significantly outperforms the existing nine state-of-the-art models on five challenging benchmark datasets. Codes and results are available at: https://github.com/RanwanWu/HINet.
Published Version
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have