3D Semantic VSLAM of Indoor Environment Based on Mask Scoring RCNN

Chongben Tao,Hanwen Gao,Chunguang Li,Yufeng Jin,Zufeng Zhang,Feng Cao

doi:10.1155/2020/5916205

Abstract

In view of existing Visual SLAM (VSLAM) algorithms when constructing semantic map of indoor environment, there are problems with low accuracy and low label classification accuracy when feature points are sparse. This paper proposed a 3D semantic VSLAM algorithm called BMASK-RCNN based on Mask Scoring RCNN. Firstly, feature points of images are extracted by Binary Robust Invariant Scalable Keypoints (BRISK) algorithm. Secondly, map points of reference key frame are projected to current frame for feature matching and pose estimation, and an inverse depth filter is used to estimate scene depth of created key frame to obtain camera pose changes. In order to achieve object detection and semantic segmentation for both static objects and dynamic objects in indoor environments and then construct dense 3D semantic map with VSLAM algorithm, a Mask Scoring RCNN is used to adjust its structure partially, where a TUM RGB-D SLAM dataset for transfer learning is employed. Semantic information of independent targets in scenes provides semantic information including categories, which not only provides high accuracy of localization but also realizes the probability update of semantic estimation by marking movable objects, thereby reducing the impact of moving objects on real-time mapping. Through simulation and actual experimental comparison with other three algorithms, results show the proposed algorithm has better robustness, and semantic information used in 3D semantic mapping can be accurately obtained.

Highlights

Simultaneous Localization and Mapping (SLAM) is a technology which enables robots or UAVs to realize autonomous positioning in an unknown environment and autonomous mapping. e robot can get rich information through sensors, which brings more conveniences to solve the problem of localization and mapping. erefore, SLAM technology is undoubtedly a priority for robot autonomy
Compared with traditional SLAM based on laser sensor, SLAM based on camera vision can make full use of rich texture information on pictures taken by the camera, which provides a huge advantage in relocation and classification of scene semantic information
Zhang et al [3] used collinear relationship of points to optimize the existing Visual SLAM (VSLAM) algorithm based on points, and a practical line matching algorithm was given, where compensating computation assisted by straight beam was utilized and the perspective of n-point algorithm was improved. e proposed method is evaluated on indoor sequences of different ranges in the dataset of TUM and compared with point-based and line-based methods. e results show that the designed algorithm has faster computing speed in comparison with VSLAM system based on point line

Summary

Introduction

Simultaneous Localization and Mapping (SLAM) is a technology which enables robots or UAVs to realize autonomous positioning in an unknown environment and autonomous mapping. e robot can get rich information through sensors, which brings more conveniences to solve the problem of localization and mapping. erefore, SLAM technology is undoubtedly a priority for robot autonomy. Sparse image features can provide limited environmental semantic information in dealing with dynamic target motion, lack of texture, or single texture environment For these problems, hierarchical image feature extraction methods represented by deep learning have appeared in the field of VSLAM in recent years, providing ideas for solving such problems. McCormac et al [18] proposed an improved Elastic Fusion SLAM [19] method based on convolution neural network to build a dense 3D semantic map, which relies on Elastic Fusion SLAM algorithm to provide estimation for interframe pose of indoor RGB-D video, uses convolution neural network to predict classes and labels of pixel-level object, and combines Bayesian upgrading strategy and conditional random field model to realize probability upgradation of predicted CNN value from different perspectives so as to generate a dense 3D semantic map. The impact of moving objects during semantic mapping is reduced by the probability update of semantic estimation by marking movable objects

Three-Dimensional Map Generation

Experiments and Analysis

Findings

Method

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Discrete Dynamics in Nature and Society	Publication Date: Oct 20, 2020
Citations: 2	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

3D Semantic VSLAM of Indoor Environment Based on Mask Scoring RCNN

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Discrete Dynamics in Nature and Society

Lead the way for us

Similar Papers

A Methodology for Principled Approximation in Visual SLAM
Yan Pei ... Donald S. Fussell
-
Yan Pei, et. al.Yan Pei ... Donald S. Fussell
30 Sep 2020
30 Sep 2020

Real-Time Dynamic SLAM Algorithm Based on Deep Learning
Peng Su ... Suyun Luo
IEEE Access | VOL. 10
Peng Su, et. al.Peng Su ... Suyun Luo
01 Jan 2021
IEEE Access | VOL. 10

Semantic Visual SLAM Algorithm Based on Improved DeepLabV3+ Model and LK Optical Flow
Yiming Li ... Liuwei Lu
Applied Sciences | VOL. 14
Yiming Li, et. al.Yiming Li ... Liuwei Lu
02 Jul 2024
Applied Sciences | VOL. 14

3D Semantic Map Construction System Based on Visual SLAM and CNNs
Lei Lai ... Xinyi Yu
-
Lei Lai, et. al.Lei Lai ... Xinyi Yu
18 Oct 2020
18 Oct 2020

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

3D Semantic VSLAM of Indoor Environment Based on Mask Scoring RCNN

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Discrete Dynamics in Nature and Society