Abstract
Visual localization provides the basis for many robotics applications such as autonomous navigation or augmented reality. Especially in outdoor scenes, robust localization requires local features which can be reliably extracted and matched under changing conditions. Previous approaches have applied generative image-to-image translation models to align images in a single domain before correspondence search. In this letter, we invert the concept and elaborate why it is more promising to use image domain adaptation for training of robust local features. Integrating this idea into a self-supervised training framework, we show in various experiments covering image matching, visual localization, and scene reconstruction that our Domain-Invariant SuperPoint (DISP) outperforms existing self-supervised methods in terms of repeatability, generalization, and robustness. In contrast to competitive supervised local features, our modular and fully self-supervised approach can be easily adapted to different domains and localization tasks as it does not require ground truth correspondences for training.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.