Abstract

Abstract. Simultaneous Localization and Mapping are the key requirements for many practical applications of robotics. However, traditional visual approaches rely on features extracted from textured surfaces, so they barely work well in indoor scenes (e.g. long corridors containing large proportions of smooth walls). In this work, we propose a novel visual odometry method to overcome these limitations, which integrates structural regularities of man-made environments in a direct sparse visual odometry system. By fully exploiting structural lines that align with the dominant direction in the Manhattan world, our approach becomes more accurate and robust to texture-less indoor environments, specially, long corridors. Given a series of image inputs, we first use the direct sparse method to obtain the coarse relative pose between camera frames, and then calculate vanishing points on each frame. Secondly, we use structural lines as rotation constraints, and perform a sliding window optimization to reduce both photometric and rotation errors, to further improve the trajectory accuracy. Through the benchmark test, it is proved that our method performs better than that of the existing visual odometry approach in long corridor environments.

Highlights

  • Estimating the position and orientation of an agent in an indoor scene is a challenging problem, that is usually addressed by Simultaneous Localization and Mapping (SLAM) technologies (Bailey, Durrant-Whyte, 2006)

  • The advantage of using this structural regularity in a visual odometry (VO) system is obvious: parallel lines aligned with the Manhattan world create direction constraints that prevent local direction errors from growing

  • We described the combination in detail, including Manhattan world representation, structural lines parameterizationand error terms designing

Read more

Summary

INTRODUCTION

Estimating the position and orientation of an agent in an indoor scene is a challenging problem, that is usually addressed by Simultaneous Localization and Mapping (SLAM) technologies (Bailey, Durrant-Whyte, 2006). Architectural scenes, having planes, texture-less walls, sharp angles and axially aligned geometries, often exhibit strong structural regularity, including parallelism and orthogonality, as shown in Figure 1 (Zhou et al, 2019). The existence of these structures provides an opportunity to constrain and simplify pose estimation. Such scenes can be abstracted as Manhattan world (Coughlan, Yuille, 1999). The advantage of using this structural regularity in a VO system is obvious: parallel lines aligned with the Manhattan world create direction constraints that prevent local direction errors from growing. We designed new error terms to merge the structural information in local sliding window optimizations which are performed on several keyframes to refine the camera pose and the point depth

RELATED WORK
SYSTEM OVERVIEW
Line Segment Detection and Vanishing Point Estimation
Error Function and Jacobian Calculation
Initialization and Coarse Tracking
Absolute Rotation Optimization using Relative Rotation
SLIDING WINDOW OPTIMIZATION
Key Frame Selection and Marginalization
Objective Function Optimization
EXPERIMENTS
CONCLUSION
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call