Automated BIM-to-scan point cloud semantic segmentation using a domain adaptation network with hybrid attention and whitening (DawNet)
Automated BIM-to-scan point cloud semantic segmentation using a domain adaptation network with hybrid attention and whitening (DawNet)
- Research Article
200
- 10.1016/j.autcon.2020.103144
- Mar 11, 2020
- Automation in Construction
Semantic segmentation of point clouds of building interiors with deep learning: Augmenting training datasets with synthetic BIM-based point clouds
- Research Article
- 10.3390/data11010016
- Jan 12, 2026
- Data
In intelligent construction and BIM–Reality integration applications, high-quality, large-scale construction scene point cloud data with component-level semantic annotations constitute a fundamental basis for three-dimensional semantic understanding and automated analysis. However, point clouds acquired from real construction sites commonly suffer from high labeling costs, severe occlusion, and unstable data distributions. Existing public datasets remain insufficient in terms of scale, component coverage, and annotation consistency, limiting their suitability for data-driven approaches. To address these challenges, this paper constructs and releases a BIM-derived synthetic construction scene point cloud dataset, termed the Synthetic Point Cloud (SPC), targeting component-level point cloud semantic segmentation and related research tasks.The dataset is generated from publicly available BIM models through physics-based virtual LiDAR scanning, producing multi-view and multi-density three-dimensional point clouds while automatically inheriting component-level semantic labels from BIM without any manual intervention. The SPC dataset comprises 132 virtual scanning scenes, with an overall scale of approximately 8.75×109 points, covering typical construction components such as walls, columns, beams, and slabs. By systematically configuring scanning viewpoints, sampling densities, and occlusion conditions, the dataset introduces rich geometric and spatial distribution diversity. This paper presents a comprehensive description of the SPC data generation pipeline, semantic mapping strategy, virtual scanning configurations, and data organization scheme, followed by statistical analysis and technical validation in terms of point cloud scale evolution, spatial coverage characteristics, and component-wise semantic distributions. Furthermore, baseline experiments on component-level point cloud semantic segmentation are provided. The results demonstrate that models trained solely on the SPC dataset can achieve stable and engineering-meaningful component-level predictions on real construction point clouds, validating the dataset’s usability in virtual-to-real research scenarios. As a scalable and reproducible BIM-derived point cloud resource, the SPC dataset offers a unified data foundation and experimental support for research on construction scene point cloud semantic segmentation, virtual-to-real transfer learning, scan-to-BIM updating, and intelligent construction monitoring.
- Dissertation
- 10.32657/10356/172480
- Jan 1, 2023
The ability to recognize the three-dimensional (3D) world profoundly impacts our comprehension, visualization, interaction, and re-creation of the physical environment. Point cloud data, renowned for its accurate representation of 3D geometric structures, has gained significant attention in both academia and industry. Meanwhile, deep neural networks (DNNs) have revolutionized various domains, including computer vision and natural language processing. Integrating point clouds with DNNs has given rise to powerful deep point cloud models, enabling enhanced recognition and understanding of the 3D world. However, current DNN models for point cloud recognition heavily rely on large amounts of densely-labelled training data, which is extremely laborious and costly to obtain. This limitation hampers the scalability of existing point cloud datasets and hinders efficient exploration across tasks and applications. This thesis explores Label-Efficient Learning for Point Cloud Recognition, aiming to minimize annotation efforts during deep network training while achieving effective results in point cloud recognition. The study focuses on three key label-efficient learning categories: data augmentation, domain transfer learning from synthetic to real data, and domain transfer learning from normal to adverse weather conditions. Through these representative approaches, we aim to enhance the efficiency and effectiveness of point cloud recognition methodologies. Within the label-efficient learning paradigm, data augmentation plays a crucial role in expanding the diversity of limited labelled training data, requiring fewer annotated point clouds to train accurate recognition models. In this thesis, we introduced a novel LiDAR point cloud augmentation technique that generates new frames within the polar coordinate system, facilitating model training in various 3D perception tasks and scenarios. Domain transfer learning from synthetic to real data leverages knowledge from synthetic point clouds with automatically generated labels to enhance the performance of deep models in recognizing real-world point clouds. By using infinite synthetic labelled point clouds, human annotations in real point clouds can be reduced or eliminated, alleviating significant annotation efforts. In this thesis, we first created a large-scale synthetic LiDAR point cloud dataset with precise point-wise annotations. Building upon this dataset, we presented two novel methodologies, involving style translation and unsupervised domain adaptation, to address domain discrepancies between synthetic and real LiDAR point clouds and facilitate synthetic-to-real domain transfer learning. Domain transfer learning from normal to adverse weather data aims to train robust recognition models using point clouds captured under normal weather conditions to perform well across diverse adverse weather conditions. This objective arises from considerable additional challenges in annotating point clouds of adverse weather since they share different geometric data characteristics compared to normal weather data. We explore transferring knowledge from normal to adverse weather point clouds to reduce the need for extensive manual annotations for adverse weather point clouds. To achieve this, we first constructed a large-scale adverse-weather point cloud dataset with point-wise annotations. Subsequently, we proposed a domain generalization and aggregation method, which enables the training of robust models exclusively using normal data, empowering them to effectively handle various adverse weather conditions. Extensive experimentation conducted across diverse point cloud recognition benchmarks demonstrates the superior performance achieved by our proposed label-efficient learning approaches.
- Research Article
32
- 10.1016/j.autcon.2023.105076
- Sep 9, 2023
- Automation in Construction
Skeleton-guided generation of synthetic noisy point clouds from as-built BIM to improve indoor scene understanding
- Research Article
12
- 10.3390/rs15010243
- Dec 31, 2022
- Remote Sensing
Multispectral LiDAR technology can simultaneously acquire spatial geometric data and multispectral wavelength intensity information, which can provide richer attribute features for semantic segmentation of point cloud scenes. However, due to the disordered distribution and huge number of point clouds, it is still a challenging task to accomplish fine-grained semantic segmentation of point clouds from large-scale multispectral LiDAR data. To deal with this situation, we propose a deep learning network that can leverage contextual semantic information to complete the semantic segmentation of large-scale point clouds. In our network, we work on fusing local geometry and feature content based on 3D spatial geometric associativity and embed it into a backbone network. In addition, to cope with the problem of redundant point cloud feature distribution found in the experiment, we designed a data preprocessing with principal component extraction to improve the processing capability of the proposed network on the applied multispectral LiDAR data. Finally, we conduct a series of comparative experiments using multispectral LiDAR point clouds of real land cover in order to objectively evaluate the performance of the proposed method compared with other advanced methods. With the obtained results, we confirm that the proposed method achieves satisfactory results in real point cloud semantic segmentation. Moreover, the quantitative evaluation metrics show that it reaches state-of-the-art.
- Research Article
14
- 10.1177/13694332241260077
- Jun 19, 2024
- Advances in Structural Engineering
Visual recognition of 3D point cloud data of bridge inspection scenes is a key step in automating the visual inspection process, which is currently largely manual and inefficient. To alleviate the lack of large-scale annotated point cloud datasets for training such 3D visual recognition algorithms, this research investigates an approach for developing large-scale synthetic point cloud datasets. The proposed approach proceeds in four steps: (1) random generation of different types of bridges in computer graphics environments; (2) sampling of camera trajectories that represent data collection scenarios during bridge inspection; (3) 3D reconstruction using Structure from Motion (SfM) applied to rendered synthetic images; (4) automated annotation of the reconstructed point cloud using ground truth masks obtained with synthetic images. Besides, this research proposes to store point uncertainty information defined by the error between the ground truth depth and the depth calculated from the SfM results. Prior to training, thresholds can be applied to this uncertainty information to control the levels of outliers in the dataset. This research demonstrates the proposed approach by generating point cloud datasets for two data collection scenarios. The effectiveness of the generated datasets is investigated by training 3D semantic segmentation algorithms and evaluating the performance on real and synthetic point cloud data. The proposed approach for point cloud dataset generation will facilitate the development of generalizable and high level-of-detail 3D recognition algorithms toward autonomous bridge inspection.
- Conference Article
87
- 10.1109/iccvw.2019.00404
- Oct 1, 2019
Point cloud data from 3D LiDAR sensors are one of the most crucial sensor modalities for versatile safety-critical applications such as self-driving vehicles. Since the annotations of point cloud data is an expensive and time-consuming process, therefore recently the utilisation of simulated environments and 3D LiDAR sensors for this task started to get some popularity. However, the generated synthetic point cloud data are still missing the artefacts usually exist in point cloud data from real 3D LiDAR sensors. Thus, in this work, we are proposing a domain adaptation framework for bridging this gap between synthetic and real point cloud data. Our proposed framework is based on the deep cycle-consistent generative adversarial networks (CycleGAN) architecture. We have evaluated the performance of our proposed framework on the task of vehicle detection from a bird's eye view (BEV) point cloud images coming from real 3D LiDAR sensors. The framework has shown competitive results with an improvement of more than 7% in average precision score over other baseline approaches when tested on real BEV point cloud images.
- Research Article
96
- 10.1609/aaai.v36i3.20183
- Jun 28, 2022
- Proceedings of the AAAI Conference on Artificial Intelligence
Knowledge transfer from synthetic to real data has been widely studied to mitigate data annotation constraints in various computer vision tasks such as semantic segmentation. However, the study focused on 2D images and its counterpart in 3D point clouds segmentation lags far behind due to the lack of large-scale synthetic datasets and effective transfer methods. We address this issue by collecting SynLiDAR, a large-scale synthetic LiDAR dataset that contains point-wise annotated point clouds with accurate geometric shapes and comprehensive semantic classes. SynLiDAR was collected from multiple virtual environments with rich scenes and layouts which consists of over 19 billion points of 32 semantic classes. In addition, we design PCT, a novel point cloud translator that effectively mitigates the gap between synthetic and real point clouds. Specifically, we decompose the synthetic-to-real gap into an appearance component and a sparsity component and handle them separately which improves the point cloud translation greatly. We conducted extensive experiments over three transfer learning setups including data augmentation, semi-supervised domain adaptation and unsupervised domain adaptation. Extensive experiments show that SynLiDAR provides a high-quality data source for studying 3D transfer and the proposed PCT achieves superior point cloud translation consistently across the three setups. The dataset is available at https://github.com/xiaoaoran/SynLiDAR.
- Research Article
9
- 10.5194/isprs-archives-xliv-4-w1-2020-95-2020
- Sep 3, 2020
- The International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences
Abstract. Point clouds obtained via Terrestrial Laser Scanning (TLS) surveys of historical buildings are generally transformed into semantically structured 3D models with manual and time-consuming workflows. The importance of automatizing this process is widely recognized within the research community. Recently, deep neural architectures have been applied for semantic segmentation of point clouds, but few studies have evaluated them in the Cultural Heritage domain, where complex shapes and mouldings make this task challenging. In this paper, we describe our experiments with the DGCNN architecture to semantically segment historical buildings point clouds, acquired with TLS. We propose a variation of the original approach where a radius distance based technique is used instead of K-Nearest Neighbors (KNN) to represent the neighborhood of points. We show that our approach provides better results by evaluating it on two real TLS point clouds, representing two Italian historical buildings: the Ducal Palace in Urbino and the Palazzo Ferretti in Ancona.
- Research Article
23
- 10.3390/rs15092371
- Apr 30, 2023
- Remote Sensing
The accurate semantic segmentation of point cloud data is the basis for their application in the inspection of extra high-voltage transmission lines (EHVTL). As deep learning evolves, point-wise-based deep neural networks have shown great potential for the semantic segmentation of EHVTL point clouds. However, EHVTL point cloud data are characterized by a large data volume and significant class imbalance. Therefore, the down-sampling method and point cloud feature extraction method used in current point-wise-based deep neural networks hardly meet the needs of computational accuracy and efficiency. In this paper, we proposed a two-step down-sampling method and a point cloud feature extraction method based on local feature aggregation of the point clouds after down-sampling in each layer of the model (LFAPAD). We then established a deep neural network named PowerLine-Net for the semantic segmentation of the EHVTL point clouds. Furthermore, in order to test and analyze the performance of PowerLine-Net, we constructed a point cloud dataset for the EHVTL scenes. Using this dataset and the Semantic3D dataset, we implemented network parameter testing, semantic segmentation, and an accuracy comparison of different networks based on PowerLine-Net. The results illustrate that the semantic segmentation model proposed in this paper has a high computational efficiency and accuracy in the semantic segmentation of EHVTL point clouds. Compared with conventional deep neural networks, including PointCNN, KPConv, SPG, PointNet++, and RandLA-Net, PowerLine-Net also achieves a higher accuracy in the semantic segmentation of EHVTL point clouds. Moreover, based on the results predicted by PowerLine-Net, the risk point detection for EHVTL point clouds has been achieved, which demonstrates the important value of this network in practical applications. In addition, as shown by the results of Semantic3D, PowerLine-Net also achieves a high segmentation accuracy, which proves its powerful capability and wide applicability in semantic segmentation for the point clouds of large-scale scenes.
- Research Article
7
- 10.1109/taes.2024.3517574
- Apr 1, 2025
- IEEE Transactions on Aerospace and Electronic Systems
Light detection and ranging (LiDAR) sensors provide accurate 3-D point clouds for noncooperative spacecraft pose estimation. Several robust methods, such as iterative closest point, exist to perform a local refinement of the pose starting from an initial estimate. However, finding the initial pose of the spacecraft is a global optimization problem, which is challenging to solve in real time. This is especially true on space hardware with limited computing power. In addition, many spacecrafts have a shape with multiple symmetries, making an unambiguous initial pose estimation impossible. This work introduces a convolutional-neural-network-based pose estimation method, accounting for potential symmetries of the target satellite. The point clouds are projected to a 2-D depth image before being processed by the network. To generate a sufficient amount of training data, a LiDAR simulator integrating multiple effects such as reflections or laser beam divergence is developed. While being trained solely on synthetic point clouds, the pose estimation method shows to be precise, efficient, and reliable when evaluated on real point clouds taken at a hardware-in-the-loop rendezvous test facility. A runtime evaluation on potential space computing hardware is also performed to demonstrate the applicability of the method to real-time onboard pose estimation.
- Research Article
26
- 10.1016/j.autcon.2021.103839
- Aug 5, 2021
- Automation in Construction
This paper presents a method for synthesizing mobile laser scanning point clouds of railroad level crossings that can be used to train neural networks for point cloud segmentation. The method arranges point cloud samples representing individual objects into new scenes using a set of simple placement rules. The point cloud samples can be cropped from real point clouds, created from 3D mesh models, or procedurally generated using mathematical functions. The scenes can consist of one or more types of samples, making it possible to combine real and synthetic data. The findings show that a network trained on scenes generated from real point cloud samples resulted in a better overall F1-score compared to a network that was trained using real scenes. Also, the performance of a network trained on a very small amount of real scenes can be improved by adding fully synthetic scenes to the training data.
- Research Article
72
- 10.1145/3409262
- Dec 3, 2020
- Journal on Computing and Cultural Heritage
Historical heritage is demanding robust pipelines for obtaining Heritage Building Information Modeling models that are fully interoperable and rich in their informative content. The definition of efficient Scan-to-BIM workflows represent a very important step toward a more efficient management of the historical real estate, as creating structured three-dimensional (3D) models from point clouds is complex and time-consuming. In this scenario, semantic segmentation of 3D Point Clouds is gaining more and more attention, since it might help to automatically recognize historical architectural elements. The way paved by recent Deep Learning approaches proved to provide reliable and affordable degrees of automation in other contexts, as road scenes understanding. However, semantic segmentation is particularly challenging in historical and classical architecture, due to the shapes complexity and the limited repeatability of elements across different buildings, which makes it difficult to define common patterns within the same class of elements. Furthermore, as Deep Learning models requires a considerably large amount of annotated data to be trained and tuned to properly handle unseen scenes, the lack of (big) publicly available annotated point clouds in the historical building domain is a huge problem, which in fact blocks the research in this direction. However, creating a critical mass of annotated point clouds by manual annotation is very time-consuming and impractical. To tackle this issue, in this work we explore the idea of leveraging synthetic point cloud data to train Deep Learning models to perform semantic segmentation of point clouds obtained via Terrestrial Laser Scanning. The aim is to provide a first assessment of the use of synthetic data to drive Deep Learning--based semantic segmentation in the context of historical buildings. To achieve this purpose, we present an improved version of the Dynamic Graph CNN (DGCNN) named RadDGCNN. The main improvement consists on exploiting the radius distance. In our experiments, we evaluate the trained models on synthetic dataset (publicly available) about two different historical buildings: the Ducal Palace in Urbino, Italy, and Palazzo Ferretti in Ancona, Italy. RadDGCNN yields good results, demonstrating improved segmentation performances on the TLS real datasets.
- Research Article
93
- 10.1016/j.isprsjprs.2021.03.001
- Mar 23, 2021
- ISPRS Journal of Photogrammetry and Remote Sensing
A point-based deep learning network for semantic segmentation of MLS point clouds
- Conference Article
- 10.1109/m2vip49856.2021.9665142
- Nov 26, 2021
Pose estimation refers to the acquisition of a rigid transformation of an object relative to its original model coordinate system. This paper proposes a deep learning based approach for pose estimation with point clouds of textureless objects. The contribution of this paper can be summarized as follows: (1) A multi-scale local feature aggregation strategy for emphasizing the neighbor region of interest points. (2) The extension of the original spatial transformer network to point clouds, pose estimation and object classification as the output in a single proposed network. (3) A deep learning model combined with the symmetric function and the multi-scale features to improve the accuracy and robustness of the network model, and a new defined joint loss function by considering the objectives of pose estimation and classification. The experiments are conducted to verify the performance of pose estimation when the point clouds of textureless objects are taken as the input data, which shows that the proposed deep learning framework effectively performs on the pose estimation for both synthetic and real point clouds according to the experimental results.