Abstract

Unmanned Aerial Vehicles (UAVs) require the ability to robustly perceive surrounding scenes for autonomous navigation. The semantic reconstruction of the scene is a truly functional understanding of the environment. However, high-performance computing is generally not available on most UAVs, so a lightweight real-time semantic reconstruction method is necessary. Existing methods rely on GPU, and it is difficult to achieve real-time semantic reconstruction on CPU. To solve the problem, an indoor dense semantic Simultaneous Localization and Mapping (SLAM) method using CPU computing is proposed in this paper, named CDSFusion. The CDSFusion is the first system integrating RGBD-based Visual-Inertial Odometry (VIO), semantic segmentation and 3D reconstruction in real-time on a CPU. In our VIO method, the depth information is introduced to improve the accuracy of pose estimation, and FAST features are used for faster tracking. In our semantic reconstruction method, the PSPNet (Pyramid Scene Parsing Network) pre-trained model is optimized to provide the semantic information in real-time on the CPU, and the semantic point clouds are fused using Voxblox. The experimental results demonstrate that camera tracking is accelerated without loss of accuracy in our VIO, and a 3D semantic map is reconstructed in real-time, which is comparable to one generated by the GPU-dependent method.

Highlights

  • Academic Editor: George KarrasThe purpose of dense semantic Simultaneous Localization and Mapping (SLAM) is to locate the robots and reconstruct a dense 3D semantic map simultaneously

  • There are many methods in geometric reconstruction, such as Structure from Motion (SfM) [1], Multi-View Stereo (MVS) [2] and SLAM [3], semantic segmentation methods based on deep learning (DL), such as [4,5,6], have not intersected with geometric reconstruction for a long time

  • The measurements input, and is composed of three modules, as shown the Figure input measurements are processed in an RGBD-based Visual-Inertial Odometry (VIO) module to estimate poses; a input measurements are processed in an RGBD-based VIO module to estimate poses; a highly accurate and globally consistent trajectory estimation is given in this part

Read more

Summary

Introduction

The purpose of dense semantic Simultaneous Localization and Mapping (SLAM) is to locate the robots and reconstruct a dense 3D semantic map simultaneously. The VIO of CDSFusion is based on the state-of-the-art VINS-Mono [14], and the depth information given by a RGBD camera was introduced to improve the robustness and accuracy of the system. CDSFusion is designed with modularity and has three key modules: an RGBD-based VIO module for pose estimation, an optimized lightweight semantic segmentation module for real-time segmentation and a 3D reconstruction module integrating semantic information for model generation. CDSFusion can fall back to a fast, robust, and accurate VIO, a lightweight semantic segmentation solution, or a dense 3D reconstruction method. Using consistent pose and semantic information, the components are organically combined to improve the operation efficiency of the system on a CPU while ensuring the localization and reconstruction accuracy.

Related Work
Method
The architecture
Visual-Inertial Odometry
Semantic Segmentation
Experiments
Feature Points Comparison
Dense Semantic SLAM
Findings
Conclusions
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call