RGB-D Visual SLAM Based on Yolov4-Tiny in Indoor Dynamic Environment.

Zhanyuan Chang,Honglin Wu,Chuanjiang Li,Yunlong Sun

doi:10.3390/mi13020230

Abstract

For a SLAM system operating in a dynamic indoor environment, its position estimation accuracy and visual odometer stability could be reduced because the system can be easily affected by moving obstacles. In this paper, a visual SLAM algorithm based on the Yolov4-Tiny network is proposed. Meanwhile, a dynamic feature point elimination strategy based on the traditional ORBSLAM is proposed. Besides this, to obtain semantic information, object detection is carried out when the feature points of the image are extracted. In addition, the epipolar geometry algorithm and the LK optical flow method are employed to detect dynamic objects. The dynamic feature points are removed in the tracking thread, and only the static feature points are used to estimate the position of the camera. The proposed method is evaluated on the TUM dataset. The experimental results show that, compared with ORB-SLAM2, our algorithm improves the camera position estimation accuracy by 93.35% in a highly dynamic environment. Additionally, the average time needed by our algorithm to process an image frame in the tracking thread is 21.49 ms, achieving real-time performance.

Highlights

Traditional visual simultaneous localization and mapping (SLAM) systems can achieve good results in static and rigid scenes with no obvious changes
With the development of computer vision and deep learning, many researchers have begun to apply the semantic information extracted from images to the visual SLAM system, such as by establishing semantic maps, and removing the objects that can move in the environment
In an indoor dynamic environment, SLAM systems are prone to be affected by moving objects, which may reduce the pose estimation accuracy and cause tracking failures

Summary

Introduction

Traditional visual simultaneous localization and mapping (SLAM) systems can achieve good results in static and rigid scenes with no obvious changes. The matching point pairs of dynamic feature points in these scenes will produce data error associations, which will directly reduce the pose estimation accuracy of the visual odometer and lose the pose tracking of the camera. This greatly limits the application of many excellent visual SLAM algorithms. The geometry method seeks to operate the pixels in the image, which has high accuracy for moving object detection, but it leads to the problems of high computational consumption and low real-time performance. Detecting and removing dynamic objects in the process of SLAM through the deep learning method can greatly improve the performance of SLAM systems.

Results

Discussion

Conclusion