Abstract

The fusion of monocular visual and inertial cues has become popular in robotics, unmanned vehicles and augmented reality fields. Recent results have shown that optimization-based fusion strategies outperform filtering strategies. Robust state estimation is the core capability for optimization-based visual–inertial Simultaneous Localization and Mapping (SLAM) systems. As a result of the nonlinearity of visual–inertial systems, the performance heavily relies on the accuracy of initial values (visual scale, gravity, velocity and Inertial Measurement Unit (IMU) biases). Therefore, this paper aims to propose a more accurate initial state estimation method. On the basis of the known gravity magnitude, we propose an approach to refine the estimated gravity vector by optimizing the two-dimensional (2D) error state on its tangent space, then estimate the accelerometer bias separately, which is difficult to be distinguished under small rotation. Additionally, we propose an automatic termination criterion to determine when the initialization is successful. Once the initial state estimation converges, the initial estimated values are used to launch the nonlinear tightly coupled visual–inertial SLAM system. We have tested our approaches with the public EuRoC dataset. Experimental results show that the proposed methods can achieve good initial state estimation, the gravity refinement approach is able to efficiently speed up the convergence process of the estimated gravity vector, and the termination criterion performs well.

Highlights

  • In recent years, visual Simultaneous Localization and Mapping (SLAM) has reached a mature stage, and there exist a number of robust real-time systems or solutions [1,2,3]

  • The concept of using one camera has become popular since the emergence of MonoSLAM [4], which is based on the extended Kalman filter (EKF) framework and is able to achieve real-time localization and mapping indoors in room-sized domains

  • Because the EuRoC dataset does not provide an explicit ground truth scale, we need to calculate the true scale according to the ground truth data and the trajectory generated from visual ORB-SLAM2

Read more

Summary

Introduction

Visual SLAM has reached a mature stage, and there exist a number of robust real-time systems or solutions [1,2,3]. The concept of using one camera has become popular since the emergence of MonoSLAM [4], which is based on the extended Kalman filter (EKF) framework and is able to achieve real-time localization and mapping indoors in room-sized domains. There have been many scholarly works on monocular visual SLAM, including PTAM [1], SVO [5], and ORB-SLAM2 [6]. Is the first optimization-based solution to split tracking and mapping into separate tasks processed in two parallel threads. ORB-SLAM2 [6] takes advantages of PTAM and further improves it

Objectives
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call