Abstract
Learning-based visual localization has become prospective over the past decades. Since ground truth pose labels are difficult to obtain, recent methods try to learn pose estimation networks using pixel-perfect synthetic data. However, this also introduces the problem of domain bias. In this paper, we first build a Tuebingen Buildings dataset of RGB images collected by a drone in urban scenes and create a 3D model for each scene. A large number of synthetic images are generated based on these 3D models. We take advantage of image style transfer and cycle-consistent adversarial training to predict the relative camera poses of image pairs based on training over synthetic environment data. We propose a relative camera pose estimation approach to solve the continuous localization problem for autonomous navigation of unmanned systems. Unlike those existing learning-based camera pose estimation methods that train and test in a single scene, our approach successfully estimates the relative camera poses of multiple city locations with a single trained model. We use the Tuebingen Buildings and the Cambridge Landmarks datasets to evaluate the performance of our approach in a single scene and across-scenes. For each dataset, we compare the performance between real images and synthetic images trained models. We also test our model in the indoor dataset 7Scenes to demonstrate its generalization ability.
Highlights
Simultaneous localization and mapping (SLAM) has prosperously evolved in recent years and is largely used in augmented reality and robot navigation
Convolutional Neural Network (CNN) combined with structure from motion (SfM) reduces the workload of constructing a database, makes it possible for deep-learning-based camera relocalization, and proposes new solutions to the problems faced by traditional visual SLAM
The results of across scenes trained relative camera pose estimation network (RCPNet) in the last column of Table 2 has an average 5% decline compared with the results of individual trained RCPNet in both datasets, but it is still comparable to PoseNet and RPNet in most scenes
Summary
Simultaneous localization and mapping (SLAM) has prosperously evolved in recent years and is largely used in augmented reality and robot navigation. Visual SLAM has been maturely applied to ground mobile robots and self-driving vehicles. For the localization of unmanned aerial or ground systems, traditional visual SLAM meets some challenges. The highspeed movement of Unmanned Aerial Vehicles (UAVs) causes massive changes in the viewpoints, which leads to a Convolutional Neural Network (CNN) is largely applied in object recognition, image classification [8] and place recognition [9]. CNN combined with SfM reduces the workload of constructing a database, makes it possible for deep-learning-based camera relocalization, and proposes new solutions to the problems faced by traditional visual SLAM
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.