Purpose Visual simultaneous localization and mapping (SLAM) methods suffer from accumulated errors, especially in challenging environments without loop closure. By constructing lightweight offline maps and using deep learning (DL)-based technology in the two stages, i.e. image retrieval and feature matching, the goal is to reconstruct the six-degree-of-freedom (6-DoF) relationship between SLAM sequences and map sequences. This study aims to propose a comprehensive coarse-to-fine 6-DoF long-term visual relocalization assisted SLAM method specifically designed for challenging environments, aiming to achieve more accurate pose estimation. Design/methodology/approach First, image global feature matching and patch-level global feature matching are conducted to achieve optimal frame-to-frame matching. Second, a DL network is introduced to extract and match features between the most similar frames, enabling point-to-point motion estimation. Finally, a fast pose graph optimization method is proposed to achieve real-time optimization of the pose in the SLAM sequence. Findings The proposed method has been successfully validated on the real-world FinnForest Dataset and UZH-FPV Drone Racing Dataset. The accuracy of the proposed method is evaluated using absolute positional error and absolute rotational error. Experimental results show that in most cases, there are significant improvements in the root mean square error and the standard deviation of the error in pose estimation, and it performs better than loop closure in terms of accuracy. This indicates that the method has strong generalizability and robustness. Originality/value The main contribution of this study is the proposal of a complete DL-based coarse-to-fine 6-DoF long-term visual relocalization method to assist vSLAM, which demonstrates enhanced robustness and generalizability and can eliminate cumulative errors in pose estimation under challenging environments.
Read full abstract