Abstract

Mobile devices usually mount a depth sensor to resolve ill-posed problems, like salient object detection on cluttered background. The main barrier of exploring RGBD data is to handle the information from two different modalities. To cope with this problem, in this paper, we propose a boundary-aware cross-modal fusion network for RGBD salient object detection. In particular, to enhance the fusion of color and depth features, we present a cross-modal feature sampling module to balance the contribution of the RGB and depth features based on the statistics of their channel values. In addition, in our multi-scale dense fusion network architecture, we not only incorporate edge-sensitive losses to preserve the boundary of the detected salient region, but also refine its structure by merging the estimated saliency maps of different scales. We accomplish the multi-scale saliency map merging using two alternative methods which produce refined saliency maps via per-pixel weighted combination and an encoder-decoder network. Extensive experimental evaluations demonstrate that our proposed framework can achieve the state-of-the-art performance on several public RGBD-based datasets.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call