Visual camera relocalization using both hand-crafted and learned features

Junyi Wang,Yue Qi

doi:10.1016/j.patcog.2023.109914

Abstract

The localization of the camera is essential in AR, MR, and robotics. Diverse pipelines employ a hand-crafted or learning based way to predict the camera pose as per the task. In the localization process, both weaknesses and strengths are maintained. However, few current frameworks consider these two features simultaneously. In this study, a novel relocalization pipeline for RGB or RGB-D input is proposed, including a coarse stage with learned features, further refinement with hand-crafted features, and a stable process to measure the confidence of both stages for improving localization robustness. Instead of directly regressing the camera pose, the coarse procedure uses registration to the known source and predicted weighted target point cloud to obtain the initial result. Therefore, we design a deep network called PGNet to construct the weighted target point cloud with the image and previous poses as inputs. Moreover, in consideration of dynamic surroundings, we add a segmentation branch distinguishing each point as either fixed or dynamic with the purpose of promoting dynamic perception. Correspondingly, the segmentation-extended Chamfer Distance is added to optimize PGNet. During the pose refinement, the feature space is established via hand-crafted feature extraction and matching on the training set. Based on the coarse pose, we obtain the accurate pose by applying Kabsch or Perspective-n-Point (PnP) algorithm to point-to-point correspondences built through searching the space and matching Oriented Fast and Rotated Brief (ORB) features. Furthermore, an additional process is presented by defining coarse and refinement metrics to gain a more stable performance. Finally, experiments on both static and dynamic scenes are conducted. On the one side, the results demonstrate the state-of-the-art performance over other existing methods on 7 Scenes, INDOOR-6, Cambridge Landmarks and TUM RGB-D. On the other side, the positive effects of the pose learning part, dynamic branch, confidence regression and hand-crafted feature based refinement are also provided.

Full Text