Abstract

Long-term visual localization is a challenging problem in practice, which depends on the observed images captured by airborne camera to make the mobile robot perform the task of pose estimation during continuously navigating in complex scenes. Semantic information in the image has great invariance in changing environment and can be used to generate robust scene descriptor, but the performance of Convolutional Neural Network (CNN) based semantic segmentation highly depends on semantic labels, the generalization ability of the trained model is weak and labeling process for large-scale scene images is labor-intensive. To solve these problems, this paper proposes a new long-term visual localization method which fuses depth and semantic information in the scene, and its novelty lies in: (1) using a module of fusing depth and semantic information in the scene aims to extract the invariant scene representation when the environment changes, which effectively improves the robustness of long-term visual localization task and (2) using a domain adaptation module with the adversarial loss has adaptation ability from virtual dataset to real dataset, which require labor-free semantic labels annotation and generalize to more realistic application scenarios. Finally, the results show that our method outperforms state-of-the-art baselines under various challenging environments on the Extended CMU Seasons and RobotCar Seasons datasets in specific precision metrics.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call