Abstract

RGB-D salient object detection (SOD) aims at detecting general attention-grabbing objects from paired RGB and depth image inputs, and recently has attracted increasing research attention. Despite that many advanced RGB-D SOD models are proposed, almost all of them focus on developing models in a fully supervised manner with a small training dataset that typically has only hundreds of RGB-D samples. This may inevitably incur poor generalizability of these models when being applied to real-world scenarios and applications. To narrow such a gap, we make the first attempt of treating RGB-D SOD as a few-shot learning (FSL) problem, and improve it by introducing extra prior knowledge from a closely related task, i.e., RGB SOD. Inspired by the general taxonomy of FSL techniques, we investigate from two perspectives, namely model and data, of transferring additional knowledge from the RGB SOD dataset to enhance RGB-D SOD performance. For the former, we employ multi-task learning with parameter sharing to constrain the model space, whereas for the latter, we propose to generate the depth from RGB by using an off-the-shelf depth estimator. Representative middle-fusion and late-fusion models are trialed and validated under such a FSL setup. Our experimental results and analyses confirm the feasibility of promoting RGB-D SOD via FSL techniques, while comparative study on different FSL techniques and detection strategies is conducted. We hope this work can serve as a catalyst for bringing RGB-D saliency detection into real applications, as well as for inspiring future works that apply few-shot learning to saliency detection and other multi-modal detection tasks.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call