Abstract
During the construction of indoor environmental semantic maps by robot Vision SLAM (VSLAM), there exist some problems such as low label classification accuracy and low precision under the situation of sparse feature points. In this case, this paper proposes an indoor three-dimensional semantic VSLAM algorithm based on Mask Regional Convolutional Neural Network (RCNN). Firstly, an Oriented FAST and a Rotated BRIEF (ORB) algorithms are used to extract image feature points. Secondly, a Random Sample Consensus (RANSAC) algorithm is employed to eliminate mismatched points and estimate camera position-pose changes. Then, a Mask RCNN algorithm is applied to make partial adjustments to its hyper parameter. A self-made data set is used to transfer learning, fulfilling real-time target detection and instance segmentation of a scene. A three-dimensional semantic map is constructed in combination with VSLAM algorithm. The semantic information in the environment not only improves the accuracy of VSLAM construction and positioning, but also reduces the impact of object movement on the construction by marking movable objects. Meanwhile, the VSLAM algorithm is used to calculate the positional constraints between objects and improve the accuracy of semantic understanding. Finally, by comparing with other methods, it demonstrates that this method is more correct and effective. It was also verified that the proposed method can accurately interpret the semantic information in environment for the construction of three-dimensional semantic maps.
Highlights
The process of utilizing the established map to synchronize and update its position by robots in unknown location and environmental map is called the Simultaneous Localization and Mapping (SLAM) problem
Visual Simultaneous Localization and Mapping (VSLAM) is a SLAM problem that uses a camera as the only sensor
Whelan et al proposed a construction method for dense threedimensional semantic map called Semantic Fusion based on convolutional neural network, which relied on Elastic Fusion SLAM algorithm to provide indoor RGB-D video inter-frame for position-pose estimation, utilizing convolutional neural network to predict pixel-level object category label which combined the Bayesian upgrade strategy with the conditional random field model
Summary
The process of utilizing the established map to synchronize and update its position by robots in unknown location and environmental map is called the Simultaneous Localization and Mapping (SLAM) problem. Whelan et al proposed a construction method for dense threedimensional semantic map called Semantic Fusion based on convolutional neural network, which relied on Elastic Fusion SLAM algorithm to provide indoor RGB-D video inter-frame for position-pose estimation, utilizing convolutional neural network to predict pixel-level object category label which combined the Bayesian upgrade strategy with the conditional random field model. It realized the probability escalation of CNN prediction values from different perspectives, and generated a dense three-dimensional semantic map containing semantic information [10]. AP stands for Average Precision, which is an indicator of the accuracy of the target detection algorithm
Published Version (Free)
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have