Abstract
Interest point detection and description are highly challenging in indoor environments with repeated and sparse textures and heavy illumination changes (noted as challenging indoor environments, CIE). In such environments, it is a severe problem of mismatched or misaligned feature points, often resulting in unsatisfactory accuracy in indoor applications, such as SLAM. To deal with the issue, we propose a self-supervised RGB-D cross-modal fusion network (RDFNet) for feature extraction. In the RDFNet, a dual-stream structure is introduced to build a pseudo-Siamese network for simultaneously processing color and depth images, while a new two-stage cross-modal reweighted fusion method (TCRF) is developed to fuse RGB and depth features. The TCRF achieves effective fusion in two steps: (1) introducing the reweighting idea and compositely enhancing RGB features by the depth features at both low-level and high-level stages; (2) concatenating the enhanced RGB and depth features together. In addition, we add a uniform distribution loss function to encourage the uniform extraction of feature points. To verify the proposed model performance, a new test dataset of specific indoor scenes is created to evaluate it and compare it to other state-of-the-art methods. Experimental results demonstrate its excellent performance in challenging indoor scenarios.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.