Abstract

RGB-D scene recognition has achieved promising performance because depth could provide complementary geometric information to RGB images. However, the inaccessibility of depth sensors severely limits RGB-D applications. In this paper, we focus on depth privileged setting, in which depth information is only available during training but not available during testing. Considering that the information obtained from RGB and depth images are complementary while attention is informative and transferable, our idea is using RGB input to hallucinate depth attention. We build our model upon modulated deformable convolutional layer and hallucinate dual attention: post-hoc importance weight and trainable spatial transformation. Specifically, we use modulation (resp., offset) learned from RGB to mimic Grad-CAM (resp., offset) learned from depth, to combine the strength of dual attention. We also design a weighted loss to avoid negative transfer according to the quality of depth attention. Extensive experiments on two benchmarks, i.e., SUN RGB-D and NYUDv2, demonstrate that our method outperforms the state-of-the-art methods for depth privileged scene recognition.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.