Machine and Deep Learning Implementations for Heritage Building Information Modelling: A Critical Review of Theoretical and Applied Research
Research domain and Problem: HBIM modelling from point cloud data has become a crucial research topic in the last decade since it is potentially considered as the central data model paving the way for the digital heritage practice beyond digitization. Reality Capture technologies such as terrestrial laser scanning, drone-mounted LiDAR sensors and photogrammetry enable the reality capture with a sub-millimetre accurate point cloud file that can be used as a reference file for Heritage Building Information Modelling (HBIM). However, HBIM modelling from the point cloud data of heritage buildings is mainly manual, error-prone, and time-consuming. Furthermore, image processing techniques are insufficient for classification and segmentation of point cloud data to speed up and enhance the current workflow for HBIM modelling. Due to the challenges and bottlenecks in the scan-to-HBIM process, which is commonly criticized as complex with its bespoke requirements, semantic segmentation of point clouds is gaining popularity in the literature. Research Aim and Methodology: Therefore, this paper aims to provide a thorough critical review of Machine Learning and Deep Learning methods for point cloud segmentation, classification, and BIM geometry automation for cultural heritage case study applications. Research findings: This paper files the challenges of HBIM practice and the opportunities for semantic point cloud segmentation found across academic literature in the last decade. Beyond definitions and basic occurrence statistics, this paper discusses the success rates and implementation challenges of machine and deep learning classification methods. Research value and contribution: This paper provides a holistic review of point cloud segmentation and its potential for further development and application in the Cultural Heritage sector. The critical analysis provides insight into the current state-of-the-art methods and advises on their suitability for HBIM projects. The review has identified highly original threads of research, which hold the potential to significantly influence practice and further applied research.
- Research Article
72
- 10.1145/3409262
- Dec 3, 2020
- Journal on Computing and Cultural Heritage
Historical heritage is demanding robust pipelines for obtaining Heritage Building Information Modeling models that are fully interoperable and rich in their informative content. The definition of efficient Scan-to-BIM workflows represent a very important step toward a more efficient management of the historical real estate, as creating structured three-dimensional (3D) models from point clouds is complex and time-consuming. In this scenario, semantic segmentation of 3D Point Clouds is gaining more and more attention, since it might help to automatically recognize historical architectural elements. The way paved by recent Deep Learning approaches proved to provide reliable and affordable degrees of automation in other contexts, as road scenes understanding. However, semantic segmentation is particularly challenging in historical and classical architecture, due to the shapes complexity and the limited repeatability of elements across different buildings, which makes it difficult to define common patterns within the same class of elements. Furthermore, as Deep Learning models requires a considerably large amount of annotated data to be trained and tuned to properly handle unseen scenes, the lack of (big) publicly available annotated point clouds in the historical building domain is a huge problem, which in fact blocks the research in this direction. However, creating a critical mass of annotated point clouds by manual annotation is very time-consuming and impractical. To tackle this issue, in this work we explore the idea of leveraging synthetic point cloud data to train Deep Learning models to perform semantic segmentation of point clouds obtained via Terrestrial Laser Scanning. The aim is to provide a first assessment of the use of synthetic data to drive Deep Learning--based semantic segmentation in the context of historical buildings. To achieve this purpose, we present an improved version of the Dynamic Graph CNN (DGCNN) named RadDGCNN. The main improvement consists on exploiting the radius distance. In our experiments, we evaluate the trained models on synthetic dataset (publicly available) about two different historical buildings: the Ducal Palace in Urbino, Italy, and Palazzo Ferretti in Ancona, Italy. RadDGCNN yields good results, demonstrating improved segmentation performances on the TLS real datasets.
- Research Article
- 10.1093/forestry/cpaf062
- Oct 14, 2025
- Forestry: An International Journal of Forest Research
Semantic segmentation of point clouds using deep learning (DL) has been the subject of research in forestry in recent years due to its potential applications. Several scientific and management disciplines, such as biodiversity monitoring, ecosystem carbon assessments, or forest management could benefit from this technique. However, it requires manual segmentation of point clouds to be used as training data. This process is highly labour-intensive and time-consuming, and there is a notable lack of publicly available datasets to support the development of accurate DL semantic segmentation models for forestry and forest ecology applications. Here, we present SegmentedForests, a curated dataset of manually segmented ground-based point clouds from forest plots, specifically designed to facilitate the training and validation of semantic segmentation models. This publicly available dataset contains >920 million labelled points from 14 forest plots, acquired using both terrestrial laser scanning (TLS) and mobile laser scanning (MLS) technologies. It covers two hectares of broadleaf, conifer, and mixed stands from different bioclimatic regions and features >1600 trees across 16 tree species. Each point cloud is labelled into multiple vegetation classes (up to 16), such as tree stems, branches, grass, shrubs, and down wood, as well as non-vegetation elements commonly present in forest scenes, including rocks, people, and stakes. Data splits to facilitate DL model development using our dataset are provided as well. The dataset is available at https://zenodo.org/records/17396681. By releasing this annotated dataset, we seek to address the critical need for publicly available, high-quality training data for DL models that perform semantic segmentation of ground-based point clouds in forest ecosystems.
- Research Article
24
- 10.3390/rs15092371
- Apr 30, 2023
- Remote Sensing
The accurate semantic segmentation of point cloud data is the basis for their application in the inspection of extra high-voltage transmission lines (EHVTL). As deep learning evolves, point-wise-based deep neural networks have shown great potential for the semantic segmentation of EHVTL point clouds. However, EHVTL point cloud data are characterized by a large data volume and significant class imbalance. Therefore, the down-sampling method and point cloud feature extraction method used in current point-wise-based deep neural networks hardly meet the needs of computational accuracy and efficiency. In this paper, we proposed a two-step down-sampling method and a point cloud feature extraction method based on local feature aggregation of the point clouds after down-sampling in each layer of the model (LFAPAD). We then established a deep neural network named PowerLine-Net for the semantic segmentation of the EHVTL point clouds. Furthermore, in order to test and analyze the performance of PowerLine-Net, we constructed a point cloud dataset for the EHVTL scenes. Using this dataset and the Semantic3D dataset, we implemented network parameter testing, semantic segmentation, and an accuracy comparison of different networks based on PowerLine-Net. The results illustrate that the semantic segmentation model proposed in this paper has a high computational efficiency and accuracy in the semantic segmentation of EHVTL point clouds. Compared with conventional deep neural networks, including PointCNN, KPConv, SPG, PointNet++, and RandLA-Net, PowerLine-Net also achieves a higher accuracy in the semantic segmentation of EHVTL point clouds. Moreover, based on the results predicted by PowerLine-Net, the risk point detection for EHVTL point clouds has been achieved, which demonstrates the important value of this network in practical applications. In addition, as shown by the results of Semantic3D, PowerLine-Net also achieves a high segmentation accuracy, which proves its powerful capability and wide applicability in semantic segmentation for the point clouds of large-scale scenes.
- Research Article
10
- 10.5194/isprs-archives-xlviii-1-2024-387-2024
- May 10, 2024
- The International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences
Abstract. This paper presents a systematic review of published case studies on Heritage Building Information Modelling (HBIM) since 2018, and identifies research gaps in the subject matter. Building upon the foundational work of Ewart and Zuecco (2019), this research aims to reveal the latest trends in HBIM implementation, identify recent developments of HBIM technologies, changes in the purpose of HBIM programs and stakeholder roles and responsibilities, and uncover knowledge gaps that provide avenues for future research. Utilizing the Preferred Reporting Items for Systematic Review and Meta-Analysis (PRISMA) approach, two major academic databases, Scopus and Web of Science (WOS), were searched, resulting in a rich and diverse dataset for analysis. The paper reports findings on the status of reality capture techniques used to acquire data for HBIM development, focusing on terrestrial laser scanning (TLS) technology. The review highlights the benefits and limitations of TLS for data acquisition in HBIM, as well as the integration of TLS with other reality capture technologies, such as Structure from Motion (SfM) and photogrammetry. The paper further outlines the typical workflow for processing TLS scan data and explores the integration of multiple point clouds for comprehensive heritage site modeling. In addition to the state of the art, this systematic review also uncovers several research gaps in the field of HBIM that offer opportunities for future research and innovation, including the lack of guidelines for data acquisition in HBIM programs, the predominantly manual development process of HBIM from TLS point cloud data, and the under-utilized capacity of TLS for long-term monitoring and change detection. This comprehensive review provides valuable insights into the current landscape of HBIM, offering guidance for future research and development in the heritage sector and highlighting areas in need of further investigation to advance the field.
- Research Article
14
- 10.1016/j.jag.2022.102974
- Sep 1, 2022
- International Journal of Applied Earth Observation and Geoinformation
A self-attention based global feature enhancing network for semantic segmentation of large-scale urban street-level point clouds
- Research Article
9
- 10.5194/isprs-archives-xliv-4-w1-2020-95-2020
- Sep 3, 2020
- The International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences
Abstract. Point clouds obtained via Terrestrial Laser Scanning (TLS) surveys of historical buildings are generally transformed into semantically structured 3D models with manual and time-consuming workflows. The importance of automatizing this process is widely recognized within the research community. Recently, deep neural architectures have been applied for semantic segmentation of point clouds, but few studies have evaluated them in the Cultural Heritage domain, where complex shapes and mouldings make this task challenging. In this paper, we describe our experiments with the DGCNN architecture to semantically segment historical buildings point clouds, acquired with TLS. We propose a variation of the original approach where a radius distance based technique is used instead of K-Nearest Neighbors (KNN) to represent the neighborhood of points. We show that our approach provides better results by evaluating it on two real TLS point clouds, representing two Italian historical buildings: the Ducal Palace in Urbino and the Palazzo Ferretti in Ancona.
- Research Article
26
- 10.1080/01431161.2023.2297177
- Jan 17, 2024
- International Journal of Remote Sensing
Point cloud has emerged as the most popular three-dimensional (3D) data format in recent years for several scientific and industrial applications. Point cloud semantic segmentation has piqued the researcher’s interest, which is a crucial stage in 3D analysis and scene comprehension. Deep learning-based processing is more feasible to increase the availability of point cloud acquisition tools that is LiDAR systems at the user end. The point cloud learning achieves tremendous success in object detection, object categorization, and semantic segmentation. To summarize the recent works with chronological development, comprehensive review of projection-, voxel-, and direct point-based point cloud semantic segmentation methods is performed from various perspectives. The commonly used point cloud benchmark datasets with their characteristics are discussed, and they are used for the performance analysis and comparison of several state-of-the-art segmentation methods. The quantitative performance analysis of these deep learning models summarizes the trend of semantic segmentation of point clouds. In the context of point cloud semantic segmentation, the various methods have specific roles. Based on the review of methods working and their performance analysis, it is concluded that the projection-based methods prioritize efficiency, which is ideal in unavailability of high-performance computing system. Voxel-based methods capture overall context, serving well in 3D object classification. Point-based approaches excel in fine details and efficiency, suited for tasks like 3D semantic segmentation. Choosing the suitable method depends on the task, data, and resources. KPConv and DGCNN are popular choices, especially for precision and adaptability to point density. However, method performance varies, underlining the need for tailored selection. Hybrid approaches, combining method strengths, promise superior results.
- Research Article
65
- 10.1186/s40494-022-00844-w
- Jan 4, 2023
- Heritage Science
Automated Heritage Building Information Modelling (HBIM) from the point cloud data has been researched in the last decade as HBIM can be the integrated data model to bring together diverse sources of complex cultural content relating to heritage buildings. However, HBIM modelling from the scan data of heritage buildings is mainly manual and image processing techniques are insufficient for the segmentation of point cloud data to speed up and enhance the current workflow for HBIM modelling. Artificial Intelligence (AI) based deep learning methods such as PointNet are introduced in the literature for point cloud segmentation. Yet, their use is mainly for manufactured and clear geometric shapes and components. To what extent PointNet based segmentation is applicable for heritage buildings and how PointNet can be used for point cloud segmentation with the best possible accuracy (ACC) are tested and analysed in this paper. In this study, classification and segmentation processes are performed on the 3D point cloud data of heritage buildings in Gaziantep, Turkey. Accordingly, it proposes a novel approach of activity workflow for point cloud segmentation with deep learning using PointNet for the heritage buildings. Twenty-eight case study heritage buildings are used, and AI training is performed using five feature labelling for segmentation namely, walls, roofs, floors, doors, and windows for each of these 28 heritage buildings. The dataset is divided into clusters with 80% training dataset and 20% prediction test dataset. PointNet algorithm was unable to provide sufficient accuracy in segmenting the point clouds due to deformation and deterioration on the existing conditions of the heritage case study buildings. However, if PointNet algorithm is trained with the restitution-based heritage data, which is called synthetic data in the research, PointNet algorithm provides high accuracy. Thus, the proposed approach can build the baseline for the accurate classification and segmentation of the heritage buildings.
- Research Article
12
- 10.3390/rs15010243
- Dec 31, 2022
- Remote Sensing
Multispectral LiDAR technology can simultaneously acquire spatial geometric data and multispectral wavelength intensity information, which can provide richer attribute features for semantic segmentation of point cloud scenes. However, due to the disordered distribution and huge number of point clouds, it is still a challenging task to accomplish fine-grained semantic segmentation of point clouds from large-scale multispectral LiDAR data. To deal with this situation, we propose a deep learning network that can leverage contextual semantic information to complete the semantic segmentation of large-scale point clouds. In our network, we work on fusing local geometry and feature content based on 3D spatial geometric associativity and embed it into a backbone network. In addition, to cope with the problem of redundant point cloud feature distribution found in the experiment, we designed a data preprocessing with principal component extraction to improve the processing capability of the proposed network on the applied multispectral LiDAR data. Finally, we conduct a series of comparative experiments using multispectral LiDAR point clouds of real land cover in order to objectively evaluate the performance of the proposed method compared with other advanced methods. With the obtained results, we confirm that the proposed method achieves satisfactory results in real point cloud semantic segmentation. Moreover, the quantitative evaluation metrics show that it reaches state-of-the-art.
- Research Article
75
- 10.5194/isprs-archives-xlii-2-w15-735-2019
- Aug 23, 2019
- The International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences
Abstract. Cultural Heritage is a testimony of past human activity, and, as such, its objects exhibit great variety in their nature, size and complexity; from small artefacts and museum items to cultural landscapes, from historical building and ancient monuments to city centers and archaeological sites. Cultural Heritage around the globe suffers from wars, natural disasters and human negligence. The importance of digital documentation is well recognized and there is an increasing pressure to document our heritage both nationally and internationally. For this reason, the three-dimensional scanning and modeling of sites and artifacts of cultural heritage have remarkably increased in recent years. The semantic segmentation of point clouds is an essential step of the entire pipeline; in fact, it allows to decompose complex architectures in single elements, which are then enriched with meaningful information within Building Information Modelling software. Notwithstanding, this step is very time consuming and completely entrusted on the manual work of domain experts, far from being automatized. This work describes a method to label and cluster automatically a point cloud based on a supervised Deep Learning approach, using a state-of-the-art Neural Network called PointNet++. Despite other methods are known, we have choose PointNet++ as it reached significant results for classifying and segmenting 3D point clouds. PointNet++ has been tested and improved, by training the network with annotated point clouds coming from a real survey and to evaluate how performance changes according to the input training data. It can result of great interest for the research community dealing with the point cloud semantic segmentation, since it makes public a labelled dataset of CH elements for further tests.
- Research Article
- 10.1080/13556207.2025.2518661
- May 4, 2025
- Journal of Architectural Conservation
Heritage or Historic BIM (HBIM), a specialised application of Building Information Modelling (BIM) for the preservation and management of historic buildings, offers transformational opportunities for the heritage conservation sectors. However, this has not been fully explored, with HBIM applications mostly used as mere archival documentation for heritage architecture. As such, this study proposes to investigate the opportunities and challenges in adopting HBIM in preserving and managing heritage buildings. The study adopts a qualitative research strategy comprising literature review and expert interviews to explore the perspective of heritage conservation stakeholders on HBIM. The collected data were analysed using thematic analysis to identify the current state of HBIM adoption, its benefits, and its challenges. Findings reveal that while HBIM offers significant opportunities, such as improved archival documentation, visualisation, and maintenance planning, its adoption remains limited due to high costs, lack of expertise, and resistance to new technologies. This study acts as a reference point illuminating the need for increased awareness, training, and investment in HBIM to fully harness its potential, positioning it as a crucial tool for the sustainable management of heritage assets. This study originality is in its primary focus on HBIM, an application that has been under explored unlike BIM.
- Research Article
- 10.1080/15583058.2025.2586029
- Nov 9, 2025
- International Journal of Architectural Heritage
Heritage Building Information Modelling (H-BIM) plays an increasing role in the Cultural Heritage sector. However, a key challenge persists in the Scan-to-BIM, the process of transforming point cloud data into usable models. Recent advancements in machine learning have enhanced the Scan-to-BIM, particularly by enabling more efficient 3D point cloud processing through semantic segmentation. In addition to methods based on the direct segmentation of 3D point clouds, there are also indirect approaches that rely on the intermediate segmentation of images representative of the 3D scene. However, their development remains limited due to the need for large datasets, which are currently unavailable for images of historical buildings and whose creation requires labour-intensive manual operations. This study introduces a semi-automated annotation technique to reduce per-pixel image annotation time by projecting manually assigned labels from 3D point clouds onto 2D images. The generated images can then support the training of image-based semantic segmentation models, which can then be integrated into multi-view or projection-based strategies for transferring the results back into 3D space. When compared to manual annotation and existing semi-automatic tools, our procedure, applied to selected case studies, yielded significant quantitative improvements in evaluation metrics such as Global Accuracy and Intersection over Union.
- Research Article
94
- 10.1016/j.isprsjprs.2021.03.001
- Mar 23, 2021
- ISPRS Journal of Photogrammetry and Remote Sensing
A point-based deep learning network for semantic segmentation of MLS point clouds
- Research Article
12
- 10.1049/cvi2.12250
- Nov 2, 2023
- IET Computer Vision
With the popularity and advancement of 3D point cloud data acquisition technologies and sensors, research into 3D point clouds has made considerable strides based on deep learning. The semantic segmentation of point clouds, a crucial step in comprehending 3D scenes, has drawn much attention. The accuracy and effectiveness of fully supervised semantic segmentation tasks have greatly improved with the increase in the number of accessible datasets. However, these achievements rely on time‐consuming and expensive full labelling. In solve of these existential issues, research on weakly supervised learning has recently exploded. These methods train neural networks to tackle 3D semantic segmentation tasks with fewer point labels. In addition to providing a thorough overview of the history and current state of the art in weakly supervised semantic segmentation of 3D point clouds, a detailed description of the most widely used data acquisition sensors, a list of publicly accessible benchmark datasets, and a look ahead to potential future development directions is provided.
- Research Article
2
- 10.3390/agriculture15010074
- Dec 31, 2024
- Agriculture
Semantic segmentation of three-dimensional (3D) plant point clouds at the stem-leaf level is foundational and indispensable for high-throughput tomato phenotyping systems. However, existing semantic segmentation methods often suffer from issues such as low precision and slow inference speed. To address these challenges, we propose an innovative encoding-decoding structure, incorporating voxel sparse convolution (SpConv) and attention-based feature fusion (VSCAFF) to enhance semantic segmentation of the point clouds of high-resolution tomato seedling images. Tomato seedling point clouds from the Pheno4D dataset labeled into semantic classes of ‘leaf’, ‘stem’, and ‘soil’ are applied for the semantic segmentation. In order to reduce the number of parameters so as to further improve the inference speed, the SpConv module is designed to function through the residual concatenation of the skeleton convolution kernel and the regular convolution kernel. The feature fusion module based on the attention mechanism is designed by giving the corresponding attention weights to the voxel diffusion features and the point features in order to avoid the ambiguity of points with different semantics having the same characteristics caused by the diffusion module, in addition to suppressing noise. Finally, to solve model training class bias caused by the uneven distribution of point cloud classes, the composite loss function of Lovász-Softmax and weighted cross-entropy is introduced to supervise the model training and improve its performance. The results show that mIoU of VSCAFF is 86.96%, which outperformed the performance of PointNet, PointNet++, and DGCNN, respectively. IoU of VSCAFF achieves 99.63% in the soil class, 64.47% in the stem class, and 96.72% in the leaf class. The time delay of 35ms in inference speed is better than PointNet++ and DGCNN. The results demonstrate that VSCAFF has high performance and inference speed for semantic segmentation of high-resolution tomato point clouds, and can provide technical support for the high-throughput automatic phenotypic analysis of tomato plants.