A New Semantic SLAM Mapping Algorithm Based on Improved YOLOv5

Weixiang Shen,Mingcan Li,Yongxing Jia,Junchao Zhu

doi:10.1109/cisp-bmei53629.2021.9624443

Abstract

Visual SLAM (V-SLAM) uses cameras for information input. In mapping, the spatial geometric information of the point cloud is used, which lacks the semantic information of the objects in the environment. This paper proposes a new semantic mapping algorithm based on improved YOLOv5. Firstly, A Pyramid Scene Parsing Network (PSPNet) segmentation head is added to YOLOv5 for performing semantic extraction of the environment. Subsequently, the robot pose is estimated with the ORB-SLAM2 framework. Finally, the semantic images, the depth images and the pose transformation matrix are sent to a mapping module to fuse a dense point cloud semantic map. Experiments show that the algorithm in this paper builds an accurate semantic map on KITTI dataset. Combined with the depth map that eliminates interference factors, it has good accuracy and robustness for semantic mapping in large-scale scenarios.

Full Text