Abstract

Most Simultaneous Localization and Mapping (SLAM) methods assume that environments are static. Such a strong assumption limits the application of most visual SLAM systems. The dynamic objects will cause many wrong data associations during the SLAM process. To address this problem, a novel visual SLAM method that follows the pipeline of feature-based methods called DM-SLAM is proposed in this paper. DM-SLAM combines an instance segmentation network with optical flow information to improve the location accuracy in dynamic environments, which supports monocular, stereo, and RGB-D sensors. It consists of four modules: semantic segmentation, ego-motion estimation, dynamic point detection and a feature-based SLAM framework. The semantic segmentation module obtains pixel-wise segmentation results of potentially dynamic objects, and the ego-motion estimation module calculates the initial pose. In the third module, two different strategies are presented to detect dynamic feature points for RGB-D/stereo and monocular cases. In the first case, the feature points with depth information are reprojected to the current frame. The reprojection offset vectors are used to distinguish the dynamic points. In the other case, we utilize the epipolar constraint to accomplish this task. Furthermore, the static feature points left are fed into the fourth module. The experimental results on the public TUM and KITTI datasets demonstrate that DM-SLAM outperforms the standard visual SLAM baselines in terms of accuracy in highly dynamic environments.

Highlights

  • Simultaneous Localization and Mapping (SLAM) is one of the key technologies in the field of intelligent mobile robots

  • We present the experimental results of DM-SLAM on the public datasets TUM RGB-D and KITTI

  • To demonstrate the improvement of DM-SLAM in dynamic scenes, we compare it with the state-of-the-art visual SLAM systems, i.e., DS-SLAM [34], DynaSLAM [36] and ORB-SLAM2 [3]

Read more

Summary

Introduction

Simultaneous Localization and Mapping (SLAM) is one of the key technologies in the field of intelligent mobile robots. Many visual SLAM systems have achieved excellent performance under certain circumstances (e.g., DTAM [1], LSD-SLAM [2], and ORB-SLAM2 [3]) It is challenging for almost all existing visual SLAM systems to provide accurate and robust location information in real-world environments because the ubiquitous moving objects will cause errors in the camera motion computation. There are several solutions for this problem, one of which is the traditional robust estimation methods such as RANSAC [4] This method removes the dynamic information that occupies a small part of the scene as an outlier, but it may fail when dynamic objects dominate the scene. The system can extract the dynamic objects from the scenes via ego-motion compensation using the information captured from several sensors This is not a cost-effective method, and cameras are often the only sensors available. In this paper, we focus on how to eliminate the effect of dynamic objects using only cameras

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call