Abstract

Camera relocalization is a challenging task, especially based on the sparse 3D map or keyframes. In this paper, we present an accurate method for RGB camera relocalization in the case of a very sparse 3D map built by limited keyframes. The core of our approach is a top-to-down feature matching strategy to provide a set of accurate 2D-to-3D matches. Specifically, we first use the landmark-based place recognition method to generate from keyframes the images similar to the current view along with the set of pairwise matched landmarks. This step constrains the 3D model points that can be matched with the current view. Then, the points are matched within the landmark pairs and combined afterward. This is in contrast to the conventional methods of feature matching that typically match points between the entire images and the whole 3D map, which, as a result, may not be robust to large viewpoint changes, the main challenge of the relocalization based on the sparse map. After feature matching, the camera pose is calculated by an efficient novel Perspective-n-Points (PnP) algorithm. We conduct experiments on challenging datasets to demonstrate that the camera poses estimated by our method based on the sparse 3D point cloud are more accurate than the classical methods using the dense map or a large number of training images.

Highlights

  • Camera relocalization plays a critical role in the field of the robotics such as mobile robot navigation, simultaneous localization and mapping (SLAM) and automatic drive [1]–[5]

  • In this paper, we focus on the camera relocalization in the case when only limited keyframes and very sparse maps are available

  • EXPERIMENTAL SETTINGS We evaluate our relocalization method on the Microsoft 7 Scenes dataset [40] including 7 scenes recorded from a Kinect RGB-D camera

Read more

Summary

Introduction

Camera relocalization plays a critical role in the field of the robotics such as mobile robot navigation, simultaneous localization and mapping (SLAM) and automatic drive [1]–[5]. Given a pre-built map or a series of images with the exact position, it aims to determine the 6D pose of the current camera view. Recent research over camera relocalization roughly falls into two classes: appearance-based and deep learning based methods. In the former methods [1]–[8], on the one hand, place recognition techniques such as bag-of-words (BoW) model [9] are used for retrieving the images similar to the current view and a coarse pose of the current view can be obtained.

Objectives
Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call