CS-SLAM: A Lightweight Semantic SLAM Method for Dynamic Scenarios
CS-SLAM: A Lightweight Semantic SLAM Method for Dynamic Scenarios
- Book Chapter
- 10.1007/978-3-030-36204-1_21
- Jan 1, 2019
Semantic SLAM is a hot research subject in the field of computer vision in recent years. The mainstream semantic SLAM method can perform real-time semantic extraction. However, under resource-constrained platforms, the algorithm does not work properly. This paper proposes a lightweight semantic LLN-SLAM method for portable devices. The method extracts the semantic information through the matching of the Object detection and the point cloud segmentation projection. In order to ensure the running speed of the program, lightweight network MobileNet is used in the Object detection and Euclidean distance clustering is applied in the point cloud segmentation. In a typical augmented reality application scenario, there is no rule to avoid the movement of others outside the user in the scene. This brings a big error to the visual positioning. So, semantic information is used to assist the positioning. The algorithm does not extract features on dynamic semantic objects. The experimental results show that the method can run stably on portable devices. And the positioning error caused by the movement of the dynamic object can be effectively corrected while establishing the environmental semantic map.
- Research Article
4
- 10.1108/ir-09-2022-0236
- Jan 27, 2023
- Industrial Robot: the international journal of robotics research and application
PurposeThe prerequisite for most traditional visual simultaneous localization and mapping (V-SLAM) algorithms is that most objects in the environment should be static or in low-speed locomotion. These algorithms rely on geometric information of the environment and restrict the application scenarios with dynamic objects. Semantic segmentation can be used to extract deep features from images to identify dynamic objects in the real world. Therefore, V-SLAM fused with semantic information can reduce the influence from dynamic objects and achieve higher accuracy. This paper aims to present a new semantic stereo V-SLAM method toward outdoor dynamic environments for more accurate pose estimation.Design/methodology/approachFirst, the Deeplabv3+ semantic segmentation model is adopted to recognize semantic information about dynamic objects in the outdoor scenes. Second, an approach that combines prior knowledge to determine the dynamic hierarchy of moveable objects is proposed, which depends on the pixel movement between frames. Finally, a semantic stereo V-SLAM based on ORB-SLAM2 to calculate accurate trajectory in dynamic environments is presented, which selects corresponding feature points on static regions and eliminates useless feature points on dynamic regions.FindingsThe proposed method is successfully verified on the public data set KITTI and ZED2 self-collected data set in the real world. The proposed V-SLAM system can extract the semantic information and track feature points steadily in dynamic environments. Absolute pose error and relative pose error are used to evaluate the feasibility of the proposed method. Experimental results show significant improvements in root mean square error and standard deviation error on both the KITTI data set and an unmanned aerial vehicle. That indicates this method can be effectively applied to outdoor environments.Originality/valueThe main contribution of this study is that a new semantic stereo V-SLAM method is proposed with greater robustness and stability, which reduces the impact of moving objects in dynamic scenes.
- Conference Article
8
- 10.1109/iros51168.2021.9636271
- Sep 27, 2021
Recent Semantic SLAM methods combine classical geometry-based estimation with deep learning-based object detection or semantic segmentation. In this paper we evaluate the quality of semantic maps generated by state-of-the-art class- and instance-aware dense semantic SLAM algorithms whose codes are publicly available and explore the impacts both semantic segmentation and pose estimation have on the quality of semantic maps. We obtain these results by providing algorithms with ground-truth pose and/or semantic segmentation data available from simulated environments. We establish that semantic segmentation is the largest source of error through our experiments, dropping mAP and OMQ performance by up to 74.3% and 71.3% respectively.
- Conference Article
- 10.1109/yac.2019.8787672
- Jun 1, 2019
Dense Simultaneous localization and mapping has attracted people's attention in recent years. However, it always consists a large map which led to an increase in storage space and generates incomplete map. In this paper, we designed a semantic SLAM system which reduce map storage space while improving integrity. The key idea is to segment objects from the background to individual models using deep neural network and reconstruct the models of same class with a common map storage space. We built a complete dense semantic system and propose a method to match two same objects in large distance.
- Research Article
1
- 10.1088/2631-8695/adeee8
- Jul 28, 2025
- Engineering Research Express
In semantic network-based dynamic scene visual SLAM methods, the masks generated by semantic networks often over-cover dynamic objects, resulting in parts of the static background being mistakenly included in the masks. Furthermore, the system fails to identify passively moving potential dynamic objects outside the masks, which impacts the pose estimation effect. To address these challenges, this paper proposes a mask-filtering method that integrates image depth information. By utilizing the consistency of object depth, this approach effectively eliminates background regions mistakenly included in the mask. Subsequently, we developed a regional dynamic point rejection strategy and proposed an anomaly detection method based on an adaptive Gaussian model. By constructing an adaptive Gaussian model for dynamic points inside the mask, setting dynamic thresholds, and updating the model parameters, this method can detect potential dynamic points outside the mask. It effectively reduces the impact of potential dynamic objects on the system’s pose estimation. Finally, this paper presents a dynamic Gaussian SLAM system based on the ORB-SLAM3 framework, named DG-SLAM. This system is used for pose estimation and dense point cloud construction, improving localization accuracy in dynamic scenes. To validate the performance of the DG-SLAM system, we conducted tests on the TUM RGBD dataset and in real-world scenarios, comparing it with ORB-SLAM3, DynaSLAM, and SG-SLAM. Compared to ORB-SLAM3, DG-SLAM achieves an average improvement of 94.54% and a minimum improvement of 89.34% in positioning accuracy on the TUM and Bonn datasets. Experimental results show that DG-SLAM can effectively detect and eliminate the interference of potential dynamic feature points in the scene, achieving good localization accuracy and strong robustness in various dynamic environments.
- Research Article
2
- 10.1088/1361-6501/ace988
- Jul 31, 2023
- Measurement Science and Technology
In visual simultaneous localization and mapping (SLAM) systems, the limitations of the assumption of scene rigidity are usually broken by using learning-based or geometry-based methods. However, learning-based methods usually have a high time cost, and geometry-based methods usually do not result in clean maps which are useful for advanced robotic applications. In this paper, an RGB-D SLAM in indoor dynamic environments with two channels that classifies frames as slightly and highly dynamic scenarios based on matching accuracy is proposed. And a geometric constraint based on Hamming distance is proposed to improve the effectiveness of matching accuracy as a basis for scenario classification. Dynamic features are detected by affine consistency constraint and semantic method. The semantic method is only used for highly dynamic scenarios to reduce the time cost of dynamic feature detection and provide a basis for mapping. Furthermore, an improved adaptive threshold algorithm is proposed to improve the robustness of feature matching. The proposed method is evaluated in the TUM RGB-D dataset and a real scenario. The experimental results demonstrate that the proposed method achieves highly accurate tracking with appreciable time cost in both slightly and highly indoor dynamic environments while obtaining effective maps.
- Conference Article
- 10.1109/icaica52286.2021.9497930
- Jun 28, 2021
Perception of the environment is an important part of robot intelligence. In order to better interact with the environment, the robot should not only know the shape of objects but also their semantics. In order to meet diversified needs, robot products are becoming more and more miniaturized, and related technologies have become research hotspots in the field. In response to this situation, this paper focuses on speed optimization based on the existing semantic map construction method to make it suitable for operation in embedded systems. This paper makes improvements to semantic segmentation and uses TensorRT to build a fast inference engine to accelerate target detection and speed up its inference speed on embedded devices. This paper uses Bayesian fusion method to fuse the semantic information of different locations to build an accurate map. Finally, in order to evaluate the real-time performance and effectiveness of this method, a test on the ADE20K data set was carried out, and the experimental results were analyzed to prove the effectiveness of the optimization of this algorithm.
- Research Article
3
- 10.1371/journal.pone.0261053
- Dec 8, 2021
- PLoS ONE
Accurate and reliable state estimation and mapping are the foundation of most autonomous driving systems. In recent years, researchers have focused on pose estimation through geometric feature matching. However, most of the works in the literature assume a static scenario. Moreover, a registration based on a geometric feature is vulnerable to the interference of a dynamic object, resulting in a decline of accuracy. With the development of a deep semantic segmentation network, we can conveniently obtain the semantic information from the point cloud in addition to geometric information. Semantic features can be used as an accessory to geometric features that can improve the performance of odometry and loop closure detection. In a more realistic environment, semantic information can filter out dynamic objects in the data, such as pedestrians and vehicles, which lead to information redundancy in generated map and map-based localization failure. In this paper, we propose a method called LiDAR inertial odometry (LIO) with loop closure combined with semantic information (LIO-CSI), which integrates semantic information to facilitate the front-end process as well as loop closure detection. First, we made a local optimization on the semantic labels provided by the Sparse Point-Voxel Neural Architecture Search (SPVNAS) network. The optimized semantic information is combined into the front-end process of tightly-coupled light detection and ranging (LiDAR) inertial odometry via smoothing and mapping (LIO-SAM), which allows us to filter dynamic objects and improve the accuracy of the point cloud registration. Then, we proposed a semantic assisted scan-context method to improve the accuracy and robustness of loop closure detection. The experiments were conducted on an extensively used dataset KITTI and a self-collected dataset on the Jilin University (JLU) campus. The experimental results demonstrate that our method is better than the purely geometric method, especially in dynamic scenarios, and it has a good generalization ability.
- Research Article
18
- 10.3390/s23031502
- Jan 29, 2023
- Sensors (Basel, Switzerland)
Monocular camera and Lidar are the two most commonly used sensors in unmanned vehicles. Combining the advantages of the two is the current research focus of SLAM and semantic analysis. In this paper, we propose an improved SLAM and semantic reconstruction method based on the fusion of Lidar and monocular vision. We fuse the semantic image with the low-resolution 3D Lidar point clouds and generate dense semantic depth maps. Through visual odometry, ORB feature points with depth information are selected to improve positioning accuracy. Our method uses parallel threads to aggregate 3D semantic point clouds while positioning the unmanned vehicle. Experiments are conducted on the public CityScapes and KITTI Visual Odometry datasets, and the results show that compared with the ORB-SLAM2 and DynaSLAM, our positioning error is approximately reduced by 87%; compared with the DEMO and DVL-SLAM, our positioning accuracy improves in most sequences. Our 3D reconstruction quality is better than DynSLAM and contains semantic information. The proposed method has engineering application value in the unmanned vehicles field.
- Research Article
- 10.1016/j.displa.2024.102892
- Nov 23, 2024
- Displays
DHDP-SLAM: Dynamic Hierarchical Dirichlet Process based data association for semantic SLAM
- Conference Article
14
- 10.1109/iros40897.2019.8967921
- Nov 1, 2019
We present a novel dataset for training and benchmarking semantic SLAM methods. The dataset consists of 200 long sequences, each one containing 3000-5000 data frames. We generate the sequences using realistic home layouts. For that we sample trajectories that simulate motions of a simple home robot, and then render the frames along the trajectories. Each data frame contains a) RGB images generated using physically-based rendering, b) simulated depth measurements, c) simulated IMU readings and d) ground truth occupancy grid of a house. Our dataset serves a wider range of purposes compared to existing datasets and is the first large-scale benchmark focused on the mapping component of SLAM. The dataset is split into train/validation/test parts sampled from different sets of virtual houses. We present benchmarking results for both classical geometry-based [1], [2] and recent learning-based [3] SLAM algorithms, a baseline mapping method [4], semantic segmentation [5] and panoptic segmentation [6]. The dataset and source code for reproducing our experiments will be publicly available at the time of publication.
- Research Article
3
- 10.5194/isprs-archives-xliii-b2-2021-399-2021
- Jun 28, 2021
- The International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences
Abstract. In this paper we present the semantic SLAM method based on a bundle of deep convolutional neural networks. It provides real-time dense semantic scene reconstruction for the autonomous driving system of an off-road robotic vehicle. Most state-of-the-art neural networks require large computing resources that go beyond the capabilities of many robotic platforms. We propose an architecture for 3D semantic scene reconstruction on top of the recent progress in computer vision by integrating SuperPoint, SuperGlue, Bi3D, DeepLabV3+, RTM3D and additional module with pre-processing, inference and postprocessing operations performed on GPU. We also updated our simulated dataset for semantic segmentation and added disparity images.
- Conference Article
- 10.1109/ccpqt56151.2022.00016
- Aug 1, 2022
Most of the maps constructed based on traditional visual SLAM technology are sparse maps, which only contain geometric information and do not contain semantic information, which limits the robot to complete the tasks of understanding. In this paper, we propose a vision-based semantic SLAM method. The visual odometry is optimized by using semantic information to remove the influence of dynamic objects in the scene. Based on the proposed method, we can finally construct a semantic map. Experiments show that, our system effectively improves the positioning and mapping accuracy.
- Research Article
217
- 10.1109/access.2021.3050617
- Jan 1, 2021
- IEEE Access
The scene rigidity is a strong assumption in typical visual Simultaneous Localization and Mapping (vSLAM) algorithms. Such strong assumption limits the usage of most vSLAM in dynamic real-world environments, which are the target of several relevant applications such as augmented reality, semantic mapping, unmanned autonomous vehicles, and service robotics. Many solutions are proposed that use different kinds of semantic segmentation methods (e.g., Mask R-CNN, SegNet) to detect dynamic objects and remove outliers. However, as far as we know, such kind of methods wait for the semantic results in the tracking thread in their architecture, and the processing time depends on the segmentation methods used. In this paper, we present RDS-SLAM, a real-time visual dynamic SLAM algorithm that is built on ORB-SLAM3 and adds a semantic thread and a semantic-based optimization thread for robust tracking and mapping in dynamic environments in real-time. These novel threads run in parallel with the others, and therefore the tracking thread does not need to wait for the semantic information anymore. Besides, we propose an algorithm to obtain as the latest semantic information as possible, thereby making it possible to use segmentation methods with different speeds in a uniform way. We update and propagate semantic information using the moving probability, which is saved in the map and used to remove outliers from tracking using a data association algorithm. Finally, we evaluate the tracking accuracy and real-time performance using the public TUM RGB-D datasets and Kinect camera in dynamic indoor scenarios. Source code and demo: https://github.com/yubaoliu/RDS-SLAM.git.
- Conference Article
19
- 10.1109/aim.2019.8868400
- Jul 1, 2019
Traditional visual SLAM algorithms run robustly under the assumption of a static environment, but always fail in dynamic scenarios, since moving objects will impair camera pose tracking. A novel semantic SLAM framework detecting potentially moving elements by Mask R-CNN to achieve robustness in dynamic scenes for RGB-D camera is proposed in this study. In the framework, semantic instance segmentation is designed to be an independent thread which runs in parallel with other three threads: tracking, local-mapping and loop-closing. While most methods only use multi-view geometry to determine whether results of segmentation are moving, the proposed method is to simultaneously estimate the camera motion and the possibility of dynamic/static parts. Experiments are performed to compare the proposed method with state-of-the-art approaches using TUM RGB-D datasets. Results demonstrate that the proposed method can improve accuracy of the absolute trajectory in dynamic scenes.
- Ask R Discovery
- Chat PDF
AI summaries and top papers from 250M+ research sources.