A real-time semantic visual SLAM approach with points and objects

Peiyu Guan,Junzhi Yu,Shuang Liang,Zhiqiang Cao,Erkui Chen,Min Tan

doi:10.1177/1729881420905443

Abstract

Visual simultaneously localization and mapping (SLAM) is important for self-localization and environment perception of service robots, where semantic SLAM can provide a more accurate localization result and a map with abundant semantic information. In this article, we propose a real-time PO-SLAM approach with the combination of both point and object measurements. With point–point association in ORB-SLAM2, we also consider point–object association based on object segmentation and object–object association, where the object segmentation is employed by combining object detection with depth histogram. Also, besides the constraint of feature points belonging to an object, a semantic constraint of relative position invariance among objects is introduced. Accordingly, two semantic loss functions with point and object information are designed and added to the bundle adjustment optimization. The effectiveness of the proposed approach is verified by experiments.

Highlights

Localization and mapping (SLAM) has become a very popular research direction in recent years, which requires to construct and update an environment map while simultaneously tracking an agent’s position.[1,2] Simultaneously localization and mapping (SLAM) has a variety of applications such as autonomous driving, mobile robots, and virtual reality
We focus on RGB-D SLAM
We propose a real-time visual Point-Object SLAM (PO-SLAM) approach on the basis of RGB-D Oriented FAST and Rotated BRIEF (ORB)-SLAM2, which incorporates object– object constraint in the bundle adjustment (BA) optimization process

Summary

Introduction

Localization and mapping (SLAM) has become a very popular research direction in recent years, which requires to construct and update an environment map while simultaneously tracking an agent’s position.[1,2] SLAM has a variety of applications such as autonomous driving, mobile robots, and virtual reality. A 3-D cuboid object detection approach is proposed,[22] and it is combined with the Oriented FAST and Rotated BRIEF (ORB) feature points to respectively build semantic error functions for static and dynamic environments. On this basis, poses of points, 3-D cuboids, and cameras are jointly optimized. The framework of the proposed semantic PO-SLAM is shown, where point features, point–point association, and point–point constraint are directly used according to ORB-SLAM2.7 In the feature extraction module, object features are extracted from the color image provided by RGB-D camera using YOLOv3.16 Considering that object detection cannot accurately express the contours of objects, we utilize the depth image to geometrically segment the detected objects based on depth histograms. 0.72 5.13 10.95 0 1.54 2.83 3.42 4.13 6.96 3.84 where p1 and p2 represent the 2-D pixel coordinates of the projections of Cj1 and Cj2 in the image, respectively, and DðÁÞ refers to the Euclidean distance of two pixels. eoo dir and eoo dis constitute the object–object error function

Experimental setup

Findings

Conclusions