Abstract
RGB-D data including paired RGB color images and depth maps is widely used in downstream computer vision tasks. However, compared with the acquisition of high-resolution color images, the depth maps captured by consumer-level sensors are always in low resolution. Within decades of research, the most state-of-the-art (SOTA) methods of depth map super-resolution cannot adaptively tune the guidance fusion for all feature positions by channel-wise feature concatenation with spatially sharing convolutional kernels. This paper proposes JTFNet to resolve this issue, which simulates the traditional Joint Trilateral Filter (JTF). Specifically, a novel JTF block is introduced to adaptively tune the fusion pattern between the color features and the depth features for all feature positions. Moreover, based on the variant of JTF block whose target features and guidance features are in the cross-scale shape, the fusion for depth features is performed in a bi-directional way. Therefore, the error accumulation along scales can be effectively mitigated by iteratively HR feature guidance. Compared with the SOTA methods, the sufficient experiment is conducted on the mainstream synthetic datasets and real datasets, i.e., Middlebury, NYU and ToF-Mark, which shows remarkable improvement of our JTFNet.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.