Robust 3D Semantic Segmentation With Incomplete Point Clouds Based on Sequential Frame Sampling
This paper proposes a method for learning 3D semantic segmentation robust to incomplete point clouds. Our method first generates pseud-incomplete point clouds from original 3D point clouds by sequential frame sampling that creates multiple subsets considering the continuity of an RGB-D sequence for reproducing incomplete areas in the point clouds. It then simultaneously learns completion networks and semantic segmentation networks with the pseud-incomplete point clouds. We evaluate our method on the 3D semantic segmentation task. Experimental results on ScanNet v2, an indoor environment, show that our method improves mIoU by 0.4 points for the original point clouds and 6.3 points for the incomplete point clouds compared with a conventional method. Experimental results on WorkPlace Dataset, an outdoor environment, show that our method improves mIoU by 6.5 points for the original point clouds and 11.1 points for the incomplete point clouds compared with the conventional method. These results improve the safety and operability of environmental awareness in applications such as robotics.
- Research Article
2
- 10.5194/isprs-archives-xlviii-1-w2-2023-209-2023
- Dec 13, 2023
- The International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences
Abstract. Creating virtual duplicates of the real world has garnered significant attention due to its applications in areas such as autonomous driving, urban planning, and urban mapping. One of the critical tasks in the computer vision community is semantic segmentation of outdoor collected point clouds. The development and research of robust semantic segmentation algorithms heavily rely on precise and comprehensive benchmark datasets. In this paper, we present the York University Teledyne Optech 3D Semantic Segmentation Dataset (YUTO Semantic), a multi-mission large-scale aerial LiDAR dataset specifically designed for 3D point cloud semantic segmentation. The dataset comprises approximately 738 million points, covering an area of 9.46 square kilometers, which results in a high point density of 100 points per square meter. Each point in the dataset is annotated with one of nine semantic classes. Additionally, we conducted performance tests of state-of-the-art algorithms to evaluate their effectiveness in semantic segmentation tasks. The YUTO Semantic dataset serves as a valuable resource for advancing research in 3D point cloud semantic segmentation and contributes to the development of more accurate and robust algorithms for real-world applications. The dataset is available at https://github.com/Yacovitch/YUTO_Semantic.
- Research Article
10
- 10.3390/s22166210
- Aug 18, 2022
- Sensors (Basel, Switzerland)
Mobile light detection and ranging (LiDAR) sensor point clouds are used in many fields such as road network management, architecture and urban planning, and 3D High Definition (HD) city maps for autonomous vehicles. Semantic segmentation of mobile point clouds is critical for these tasks. In this study, we present a robust and effective deep learning-based point cloud semantic segmentation method. Semantic segmentation is applied to range images produced from point cloud with spherical projection. Irregular 3D mobile point clouds are transformed into regular form by projecting the clouds onto the plane to generate 2D representation of the point cloud. This representation is fed to the proposed network that produces semantic segmentation. The local geometric feature vector is calculated for each point. Optimum parameter experiments were also performed to obtain the best results for semantic segmentation. The proposed technique, called SegUNet3D, is an ensemble approach based on the combination of U-Net and SegNet algorithms. SegUNet3D algorithm has been compared with five different segmentation algorithms on two challenging datasets. SemanticPOSS dataset includes the urban area, whereas RELLIS-3D includes the off-road environment. As a result of the study, it was demonstrated that the proposed approach is superior to other methods in terms of mean Intersection over Union (mIoU) in both datasets. The proposed method was able to improve the mIoU metric by up to 15.9% in the SemanticPOSS dataset and up to 5.4% in the RELLIS-3D dataset.
- Research Article
9
- 10.1016/j.imavis.2021.104129
- Mar 3, 2021
- Image and Vision Computing
HCFS3D: Hierarchical coupled feature selection network for 3D semantic and instance segmentation
- Research Article
45
- 10.1016/j.autcon.2020.103206
- May 8, 2020
- Automation in Construction
Robust segmentation and localization of structural planes from photogrammetric point clouds in construction sites
- Research Article
2
- 10.5194/isprs-archives-xlii-2-w13-785-2019
- Jun 5, 2019
- The International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences
Abstract. With the rapid development of new indoor sensors and acquisition techniques, the amount of indoor three dimensional (3D) point cloud models was significantly increased. However, these massive “blind” point clouds are difficult to satisfy the demand of many location-based indoor applications and GIS analysis. The robust semantic segmentation of 3D point clouds remains a challenge. In this paper, a segmentation with layout estimation network (SLENet)-based 2D–3D semantic transfer method is proposed for robust segmentation of image-based indoor 3D point clouds. Firstly, a SLENet is devised to simultaneously achieve the semantic labels and indoor spatial layout estimation from 2D images. A pixel labeling pool is then constructed to incorporate the visual graphical model to realize the efficient 2D–3D semantic transfer for 3D point clouds, which avoids the time-consuming pixel-wise label transfer and the reprojection error. Finally, a 3D-contextual refinement, which explores the extra-image consistency with 3D constraints is developed to suppress the labeling contradiction caused by multi-superpixel aggregation. The experiments were conducted on an open dataset (NYUDv2 indoor dataset) and a local dataset. In comparison with the state-of-the-art methods in terms of 2D semantic segmentation, SLENet can both learn discriminative enough features for inter-class segmentation while preserving clear boundaries for intra-class segmentation. Based on the excellence of SLENet, the final 3D semantic segmentation tested on the point cloud created from the local image dataset can reach a total accuracy of 89.97%, with the object semantics and indoor structural information both expressed.
- Conference Article
2
- 10.1109/rcar52367.2021.9517340
- Jul 15, 2021
Three dimensional (3D) semantic segmentation is important in many scenarios, such as automatic driving, robotic navigation, etc. Random point sampling proves to be computation and memory efficient to tackle large-scale point clouds in semantic segmentation. However, information of small objects or the edge of objects may be lost. Instead of down-sampling point cloud directly, in this paper we propose a dilated nearest-neighbor encoding module to enlarge the receptive field to learn more 3D geometric information. To further reduce the layers of previous neural networks, we designed a multi-level hierarchical feature fusion network. We present one end-to-end 3D semantic segmentation framework based on the backbone of RandLA-Net and the two key components, dilated convolution and efficient feature fusion. Experiments on the benchmark 3D dataset prove that our framework performs better than other state-of-the-art approaches with fewer network layers.
- Research Article
3
- 10.3390/rs16030453
- Jan 24, 2024
- Remote Sensing
Since camera and LiDAR sensors provide complementary information for the 3D semantic segmentation of intelligent vehicles, extensive efforts have been invested to fuse information from multi-modal data. Despite considerable advantages, fusion-based methods still have inevitable limitations: field-of-view disparity between two modal inputs, demanding precise paired data as inputs in both the training and inferring stages, and consuming more resources. These limitations pose significant obstacles to the practical application of fusion-based methods in real-world scenarios. Therefore, we propose a robust 3D semantic segmentation method based on multi-modal collaborative learning, aiming to enhance feature extraction and segmentation performance for point clouds. In practice, an attention based cross-modal knowledge distillation module is proposed to effectively acquire comprehensive information from multi-modal data and guide the pure point cloud network; then, a confidence-map-driven late fusion strategy is proposed to dynamically fuse the results of two modalities at the pixel-level to complement their advantages and further optimize segmentation results. The proposed method is evaluated on two public datasets (urban dataset SemanticKITTI and off-road dataset RELLIS-3D) and our unstructured test set. The experimental results demonstrate the competitiveness of state-of-the-art methods in diverse scenarios and a robustness to sensor faults.
- Research Article
- 10.1080/17538947.2025.2528811
- Jul 22, 2025
- International Journal of Digital Earth
The application of LiDAR point cloud for urban environment analysis has become a critical approach in urban scene understanding. Concurrently, substantial progress has been made in 3D point cloud semantic segmentation, advancing the precision and effectiveness of urban scene interpretation. However, existing methods face challenges when handling long-range LiDAR point cloud, where reduced point density and increased noise at greater distances result in segmentation errors and diminished accuracy. To this end, we propose PASeg, which incorporates two key components: the Positional-Guided Classifier (PGC) and the Multimodal Semantic Alignment (MSA) module. The PGC uses positional embeddings to dynamically adjust normalization parameters, thereby improving segmentation accuracy across varying distances. The MSA module aligns semantic features from text, image, and point cloud data, facilitating better category differentiation. The interaction between PGC and MSA strengthens large-scale 3D semantic segmentation synergistically. Extensive experiments on the SemanticKITTI and nuScenes datasets demonstrate that PASeg’s overall segmentation performance is competitive with state-of-the-art methods. Notably, our method achieves a significant improvement of over 2.3% and 1.7% in long-range LiDAR point cloud segmentation (30–40 m and 40–50 m, respectively) compared to the baseline segmenter on the SemanticKITTI dataset. PASeg improves urban segmentation for smart, sustainable city development.
- Research Article
28
- 10.1016/j.cag.2022.06.010
- Jun 26, 2022
- Computers & Graphics
Improving performance of deep learning models for 3D point cloud semantic segmentation via attention mechanisms
- Research Article
24
- 10.1080/01431161.2023.2297177
- Jan 17, 2024
- International Journal of Remote Sensing
Point cloud has emerged as the most popular three-dimensional (3D) data format in recent years for several scientific and industrial applications. Point cloud semantic segmentation has piqued the researcher’s interest, which is a crucial stage in 3D analysis and scene comprehension. Deep learning-based processing is more feasible to increase the availability of point cloud acquisition tools that is LiDAR systems at the user end. The point cloud learning achieves tremendous success in object detection, object categorization, and semantic segmentation. To summarize the recent works with chronological development, comprehensive review of projection-, voxel-, and direct point-based point cloud semantic segmentation methods is performed from various perspectives. The commonly used point cloud benchmark datasets with their characteristics are discussed, and they are used for the performance analysis and comparison of several state-of-the-art segmentation methods. The quantitative performance analysis of these deep learning models summarizes the trend of semantic segmentation of point clouds. In the context of point cloud semantic segmentation, the various methods have specific roles. Based on the review of methods working and their performance analysis, it is concluded that the projection-based methods prioritize efficiency, which is ideal in unavailability of high-performance computing system. Voxel-based methods capture overall context, serving well in 3D object classification. Point-based approaches excel in fine details and efficiency, suited for tasks like 3D semantic segmentation. Choosing the suitable method depends on the task, data, and resources. KPConv and DGCNN are popular choices, especially for precision and adaptability to point density. However, method performance varies, underlining the need for tailored selection. Hybrid approaches, combining method strengths, promise superior results.
- Research Article
4
- 10.3390/rs15225324
- Nov 11, 2023
- Remote Sensing
A textured urban 3D mesh is an important part of 3D real scene technology. Semantically segmenting an urban 3D mesh is a key task in the photogrammetry and remote sensing field. However, due to the irregular structure of a 3D mesh and redundant texture information, it is a challenging issue to obtain high and robust semantic segmentation results for an urban 3D mesh. To address this issue, we propose a semantic urban 3D mesh segmentation network (MeshNet) with sparse prior (SP), named MeshNet-SP. MeshNet-SP consists of a differentiable sparse coding (DSC) subnetwork and a semantic feature extraction (SFE) subnetwork. The DSC subnetwork learns low-intrinsic-dimensional features from raw texture information, which increases the effectiveness and robustness of semantic urban 3D mesh segmentation. The SFE subnetwork produces high-level semantic features from the combination of features containing the geometric features of a mesh and the low-intrinsic-dimensional features of texture information. The proposed method is evaluated on the SUM dataset. The results of ablation experiments demonstrate that the low-intrinsic-dimensional feature is the key to achieving high and robust semantic segmentation results. The comparison results show that the proposed method can achieve competitive accuracies, and the maximum increase can reach 34.5%, 35.4%, and 31.8% in mR, mF1, and mIoU, respectively.
- Conference Article
11
- 10.1109/mass50613.2020.00079
- Dec 1, 2020
Three-Dimensional (3D) semantic segmentation is an essential building block for interactive Augmented Reality (AR). However, existing Deep Neural Network (DNN) models for segmenting 3D objects are not only computation-intensive but also memory heavy, hindering their deployment on resource-constrained mobile devices. We present the design, implementation and evaluation of Slimmer, a generic and model-independent framework for accelerating 3D semantic segmentation and facilitating its real-time applications on mobile devices. In contrast to the current practice that directly feeds a point cloud to DNN models, Slimmer is motivated by our observation that these models remain high accuracy even if we remove a fraction of points from the input, which can significantly reduce the inference time and memory usage of these models. Our design of Slimmer faces two key challenges. First, the simplification method of point clouds should be lightweight. Otherwise, the reduced inference time may be canceled out by the incurred overhead of input-data simplification. Second, Slimmer still needs to accurately segment the removed points from the input to create a complete segmentation of the original input, again, using a lightweight method. Our extensive performance evaluation demonstrates that, by addressing these two challenges, Slimmer can dramatically reduce the resource utilization of a representative DNN model for 3D semantic segmentation. For example, if we can tolerate 1% accuracy loss, the reduction could be ~20% for inference time and ~9% for memory usage. The reduction increases to around ~27% for inference time and ~15% for memory usage when we can tolerate 2% accuracy loss.
- Research Article
24
- 10.3390/rs14143415
- Jul 16, 2022
- Remote Sensing
Three-dimensional digital models play a pivotal role in city planning, monitoring, and sustainable management of smart and Digital Twin Cities (DTCs). In this context, semantic segmentation of airborne 3D point clouds is crucial for modeling, simulating, and understanding large-scale urban environments. Previous research studies have demonstrated that the performance of 3D semantic segmentation can be improved by fusing 3D point clouds and other data sources. In this paper, a new prior-level fusion approach is proposed for semantic segmentation of large-scale urban areas using optical images and point clouds. The proposed approach uses image classification obtained by the Maximum Likelihood Classifier as the prior knowledge for 3D semantic segmentation. Afterwards, the raster values from classified images are assigned to Lidar point clouds at the data preparation step. Finally, an advanced Deep Learning model (RandLaNet) is adopted to perform the 3D semantic segmentation. The results show that the proposed approach provides good results in terms of both evaluation metrics and visual examination with a higher Intersection over Union (96%) on the created dataset, compared with (92%) for the non-fusion approach.
- Research Article
46
- 10.1016/j.compag.2021.106445
- Sep 13, 2021
- Computers and Electronics in Agriculture
3D point cloud semantic segmentation toward large-scale unstructured agricultural scene classification
- Research Article
2
- 10.3390/rs16010092
- Dec 25, 2023
- Remote Sensing
The semantic segmentation of drone LiDAR data is important in intelligent industrial operation and maintenance. However, current methods are not effective in directly processing airborne true-color point clouds that contain geometric and color noise. To overcome this challenge, we propose a novel hybrid learning framework, named SSGAM-Net, which combines supervised and semi-supervised modules for segmenting objects from airborne noisy point clouds. To the best of our knowledge, we are the first to build a true-color industrial point cloud dataset, which is obtained by drones and covers 90,000 m2. Secondly, we propose a plug-and-play module, named the Global Adjacency Matrix (GAM), which utilizes only few labeled data to generate the pseudo-labels and guide the network to learn spatial relationships between objects in semi-supervised settings. Finally, we build our point cloud semantic segmentation network, SSGAM-Net, which combines a semi-supervised GAM module and a supervised Encoder–Decoder module. To evaluate the performance of our proposed method, we conduct experiments to compare our SSGAM-Net with existing advanced methods on our expert-labeled dataset. The experimental results show that our SSGAM-Net outperforms the current advanced methods, reaching 85.3% in mIoU, which ranges from 4.2 to 58.0% higher than other methods, achieving a competitive level.