Abstract

With the rapid development of computer vision and deep learning, researchers have begun to focus on the semantic characteristics of traditional Simultaneous Localization And Mapping in three-Dimensional scenes. The point cloud map generated by the traditional simultaneous localization and mapping method takes up considerable storage space and cannot extract semantic information from the scene, which cannot meet the requirements of intelligent robot navigation and high-level semantic understanding. To solve this problem, this paper proposes a semantic information fusion OctoMap method. First, the color and depth images obtained from RGB-D by ORB-SLAM2 are used to locate the camera. Second, the Convolutional Block Attention Module-Pyramid Scene Parsing Network is introduced to segment the input RGB image semantically to improve the segmentation accuracy and obtain high-level semantic information in the environment. Then, a semantic fusion algorithm based on Bayesian fusion is introduced to fuse multiview semantic information. Finally, the generated semantic point cloud is inserted into OctoMap, and its octree data structure is used to compress the storage space. Experimental results based on the ADE20K dataset show that, compared with Pyramid Scene Parsing Network, Convolutional Block Attention Module-Pyramid Scene Parsing Network improves Mean Pixel Accuracy by 2.55%, and Mean Intersection over Union by 1.88%. Experimental results based on the TUM dataset show that the proposed method greatly reduces storage space and achieves the effect of voxels by voxel dense semantic mapping compared with point clouds and a traditional OctoMap.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call