3FO: The Three-Frame-Only Approach for Fast and Accurate Monocular SLAM Initialization

Peng Zhang,Wenfen Liu

doi:10.1109/access.2022.3213684

Abstract

The monocular simultaneous localization and mapping (SLAM) system is one of the most important among all visual or visual-inertial SLAM (VSLAM or VISLAM) systems due to its low cost, easy calibration and identification. Initialization is always crucial to bootstrap the monocular SLAM system. With the rapid growth of some quick-start required SLAM applications, e.g., augmented reality (AR) and unmanned aerial vehicles (UAVs), devising faster and more accurate initialization has become a central problem. Traditional initialization uses the first two frames to create landmarks and then computes camera poses for the subsequential frames, using a 3D-to-2D perspective-n-points (PnP) mechanism. In this paper, we propose a novel three-frame-only (3FO) initialization approach for the monocular SLAM system, which consists of two steps. In the first step, we use the first two frames to preinitialize poses and landmarks, and in the second step, we use the second and third frames to improve the preinitialization by using the scale consistency of the landmarks generated in the first step to filter out outliers and using inliers to generate more robust landmarks. In both steps, we use the pretrained multilayer perceptron (MLP) combined with homotopy continuation to solve the essential matrices. Finally, we perform a global bundle adjustment (BA) to refine the three camera poses and all the created landmarks. The proposed 3FO initialization approach is evaluated experimentally on the EuRoC benchmark data set with the initialization time and trajectory metrics. The results show that, compared to the traditional ORB-SLAM2 initialization, the 3FO approach reduces the initialization time by 5 times and improves the accuracy by 36.7% on average.

Full Text