A self-attention based global feature enhancing network for semantic segmentation of large-scale urban street-level point clouds
A self-attention based global feature enhancing network for semantic segmentation of large-scale urban street-level point clouds
- Research Article
24
- 10.3390/rs15092371
- Apr 30, 2023
- Remote Sensing
The accurate semantic segmentation of point cloud data is the basis for their application in the inspection of extra high-voltage transmission lines (EHVTL). As deep learning evolves, point-wise-based deep neural networks have shown great potential for the semantic segmentation of EHVTL point clouds. However, EHVTL point cloud data are characterized by a large data volume and significant class imbalance. Therefore, the down-sampling method and point cloud feature extraction method used in current point-wise-based deep neural networks hardly meet the needs of computational accuracy and efficiency. In this paper, we proposed a two-step down-sampling method and a point cloud feature extraction method based on local feature aggregation of the point clouds after down-sampling in each layer of the model (LFAPAD). We then established a deep neural network named PowerLine-Net for the semantic segmentation of the EHVTL point clouds. Furthermore, in order to test and analyze the performance of PowerLine-Net, we constructed a point cloud dataset for the EHVTL scenes. Using this dataset and the Semantic3D dataset, we implemented network parameter testing, semantic segmentation, and an accuracy comparison of different networks based on PowerLine-Net. The results illustrate that the semantic segmentation model proposed in this paper has a high computational efficiency and accuracy in the semantic segmentation of EHVTL point clouds. Compared with conventional deep neural networks, including PointCNN, KPConv, SPG, PointNet++, and RandLA-Net, PowerLine-Net also achieves a higher accuracy in the semantic segmentation of EHVTL point clouds. Moreover, based on the results predicted by PowerLine-Net, the risk point detection for EHVTL point clouds has been achieved, which demonstrates the important value of this network in practical applications. In addition, as shown by the results of Semantic3D, PowerLine-Net also achieves a high segmentation accuracy, which proves its powerful capability and wide applicability in semantic segmentation for the point clouds of large-scale scenes.
- Research Article
- 10.1093/forestry/cpaf062
- Oct 14, 2025
- Forestry: An International Journal of Forest Research
Semantic segmentation of point clouds using deep learning (DL) has been the subject of research in forestry in recent years due to its potential applications. Several scientific and management disciplines, such as biodiversity monitoring, ecosystem carbon assessments, or forest management could benefit from this technique. However, it requires manual segmentation of point clouds to be used as training data. This process is highly labour-intensive and time-consuming, and there is a notable lack of publicly available datasets to support the development of accurate DL semantic segmentation models for forestry and forest ecology applications. Here, we present SegmentedForests, a curated dataset of manually segmented ground-based point clouds from forest plots, specifically designed to facilitate the training and validation of semantic segmentation models. This publicly available dataset contains >920 million labelled points from 14 forest plots, acquired using both terrestrial laser scanning (TLS) and mobile laser scanning (MLS) technologies. It covers two hectares of broadleaf, conifer, and mixed stands from different bioclimatic regions and features >1600 trees across 16 tree species. Each point cloud is labelled into multiple vegetation classes (up to 16), such as tree stems, branches, grass, shrubs, and down wood, as well as non-vegetation elements commonly present in forest scenes, including rocks, people, and stakes. Data splits to facilitate DL model development using our dataset are provided as well. The dataset is available at https://zenodo.org/records/17396681. By releasing this annotated dataset, we seek to address the critical need for publicly available, high-quality training data for DL models that perform semantic segmentation of ground-based point clouds in forest ecosystems.
- Research Article
94
- 10.1016/j.isprsjprs.2021.03.001
- Mar 23, 2021
- ISPRS Journal of Photogrammetry and Remote Sensing
A point-based deep learning network for semantic segmentation of MLS point clouds
- Research Article
20
- 10.3390/rs14205134
- Oct 14, 2022
- Remote Sensing
With the rapid development of cities, semantic segmentation of urban scenes, as an important and effective imaging method, can accurately obtain the distribution information of typical urban ground features, reflecting the development scale and the level of greenery in the cities. There are some challenging problems in the semantic segmentation of point clouds in urban scenes, including different scales, imbalanced class distribution, and missing data caused by occlusion. Based on the point cloud semantic segmentation network RandLA-Net, we propose the semantic segmentation networks RandLA-Net++ and RandLA-Net3+. The RandLA-Net++ network is a deep fusion of the shallow and deep features of the point clouds, and a series of nested dense skip connections is used between the encoder and decoder. RandLA-Net3+ is based on the multi-scale connection between the encoder and decoder; it also connects internally within the decoder to capture fine-grained details and coarse-grained semantic information at a full scale. We also propose incorporating dilated convolution to increase the receptive field and compare the improvement effect of different loss functions on sample class imbalance. After verification and analysis of our labeled urban scene LiDAR point cloud dataset—called NJSeg-3D—the mIoU of the RandLA-Net++ and RandLA-Net3+ networks is 3.4% and 3.2% higher, respectively, than the benchmark network RandLA-Net.
- Research Article
12
- 10.3390/rs15010243
- Dec 31, 2022
- Remote Sensing
Multispectral LiDAR technology can simultaneously acquire spatial geometric data and multispectral wavelength intensity information, which can provide richer attribute features for semantic segmentation of point cloud scenes. However, due to the disordered distribution and huge number of point clouds, it is still a challenging task to accomplish fine-grained semantic segmentation of point clouds from large-scale multispectral LiDAR data. To deal with this situation, we propose a deep learning network that can leverage contextual semantic information to complete the semantic segmentation of large-scale point clouds. In our network, we work on fusing local geometry and feature content based on 3D spatial geometric associativity and embed it into a backbone network. In addition, to cope with the problem of redundant point cloud feature distribution found in the experiment, we designed a data preprocessing with principal component extraction to improve the processing capability of the proposed network on the applied multispectral LiDAR data. Finally, we conduct a series of comparative experiments using multispectral LiDAR point clouds of real land cover in order to objectively evaluate the performance of the proposed method compared with other advanced methods. With the obtained results, we confirm that the proposed method achieves satisfactory results in real point cloud semantic segmentation. Moreover, the quantitative evaluation metrics show that it reaches state-of-the-art.
- Research Article
5
- 10.1186/s40494-024-01367-2
- Aug 2, 2024
- Heritage Science
Semantic segmentation of point cloud data of architectural cultural heritage is of significant importance for HBIM modeling, disease extraction and analysis, and heritage restoration research fields. In the semantic segmentation task of architectural point cloud data, especially for the protection and analysis of architectural cultural heritage, the previous deep learning methods have poor segmentation effects due to the complexity and unevenness of the data, the high geometric feature similarity between different components, and the large scale changes. To this end, this paper proposes a novel encoder-decoder architecture called DSC-Net. It consists of an encoder-decoder structure based on point random sampling and several fully connected layers for semantic segmentation. To overcome the loss of key features caused by random downsampling, DSC-Net has developed two new feature aggregation schemes: the enhanced dual attention pooling module and the global context feature module, to learn discriminative features for the challenging scenes mentioned above. The former fully considers the topology and semantic similarity of neighboring points, generating attention features that can distinguish categories with similar structures. The latter uses spatial location and neighboring volume ratio to provide an overall view of different types of architectural scenes, helping the network understand the spatial relationships and hierarchical structures between different architectural elements. The proposed modules can be easily embedded into various network architectures for point cloud semantic segmentation. We conducted experiments on multiple datasets, including the ancient architecture dataset, the ArCH architectural cultural heritage dataset, and the publicly available architectural segmentation dataset S3DIS. The results show that the mIoU reached 63.56%, 55.84%, and 71.03% respectively. The experimental results prove that our method has the best segmentation effect in dealing with challenging architectural cultural heritage data and also demonstrates its practicality in a wider range of architectural point cloud segmentation applications.
- Research Article
6
- 10.1109/jstars.2023.3264240
- Jan 1, 2023
- IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing
The semantic segmentation of light detection and ranging (LiDAR) point clouds plays an important role in 3-D scene intelligent perception and semantic modeling. The unstructured, sparse and uneven characteristics of point clouds pose great challenges to the representation of the local geometric shapes, which degrades semantic segmentation performance. To address the challenges of describing local geometric shapes due to unstructured and sparse 3-D point clouds, this article proposes a category-contrastive-guided graph convolutional network (CGGC-Net) for the semantic segmentation of LiDAR point clouds. First, a detailed geometric structure of the raw point clouds is encoded to represent the inherent geometric pattern within the local neighborhood. At the same time, the geometric structures information is transmitted across multiple layers, so that the geometric structure encoding information containing different receptive fields and richer neighborhood spatial structure can be aggregated. Following this, the graph convolution neural network uses the edge convolution layer to adaptively describe the semantic correlation between the query point and its neighboring points, and combines the attention mechanism to gather the surrounding feature information to the query point. As a result, the graph convolution neural network and attention mechanism are iteratively stacked for the aggregation and fusion of spatial context semantic information, to generate highly discriminative semantic feature representation. Finally, the superparameters of the model are learned through a multitask optimization strategy guided by category-aware contrastive loss and cross-entropy loss. Experiments are conducted on the public SemanticKITTI dataset and the Stanford large-scale 3-D Indoor Spaces dataset to demonstrate the effectiveness and reliability of the proposed CGGC-Net from both quantitative and qualitative perspectives. The results indicate its capability of automatically classifying LiDAR point clouds, with a mean intersection-over-union of 58.4%. Moreover, multiple comparative experiments also demonstrate the superior performance of the proposed method, exceeding state-of-the-art methods.
- Research Article
72
- 10.1145/3409262
- Dec 3, 2020
- Journal on Computing and Cultural Heritage
Historical heritage is demanding robust pipelines for obtaining Heritage Building Information Modeling models that are fully interoperable and rich in their informative content. The definition of efficient Scan-to-BIM workflows represent a very important step toward a more efficient management of the historical real estate, as creating structured three-dimensional (3D) models from point clouds is complex and time-consuming. In this scenario, semantic segmentation of 3D Point Clouds is gaining more and more attention, since it might help to automatically recognize historical architectural elements. The way paved by recent Deep Learning approaches proved to provide reliable and affordable degrees of automation in other contexts, as road scenes understanding. However, semantic segmentation is particularly challenging in historical and classical architecture, due to the shapes complexity and the limited repeatability of elements across different buildings, which makes it difficult to define common patterns within the same class of elements. Furthermore, as Deep Learning models requires a considerably large amount of annotated data to be trained and tuned to properly handle unseen scenes, the lack of (big) publicly available annotated point clouds in the historical building domain is a huge problem, which in fact blocks the research in this direction. However, creating a critical mass of annotated point clouds by manual annotation is very time-consuming and impractical. To tackle this issue, in this work we explore the idea of leveraging synthetic point cloud data to train Deep Learning models to perform semantic segmentation of point clouds obtained via Terrestrial Laser Scanning. The aim is to provide a first assessment of the use of synthetic data to drive Deep Learning--based semantic segmentation in the context of historical buildings. To achieve this purpose, we present an improved version of the Dynamic Graph CNN (DGCNN) named RadDGCNN. The main improvement consists on exploiting the radius distance. In our experiments, we evaluate the trained models on synthetic dataset (publicly available) about two different historical buildings: the Ducal Palace in Urbino, Italy, and Palazzo Ferretti in Ancona, Italy. RadDGCNN yields good results, demonstrating improved segmentation performances on the TLS real datasets.
- Research Article
- 10.1080/2150704x.2024.2343131
- Apr 23, 2024
- Remote Sensing Letters
This letter presents a long-range contextual dependency enhanced network (LCDE-Net) for semantic segmentation of large-scale point cloud, which employs a U-shaped framework. Firstly, point clouds are subsampled with grid sampling and fed into convolutional layers to learn more representative local features of points. Then global and local encoders (GLE) are designed to exploit long-range contextual dependencies and local features simultaneously. The core of GLE consists of two parts: global feature enhancement (GFE) module and feature channel modulation (FCM) module. Secondly, through decoder layers, the encoded features are upsampled through the nearest-neighbour interpolation and aggregated with the intermediate encoded features by skip connections to capture multi-scale discriminative features for semantic segmentation of point cloud. Finally, via Fully Connection layer and Softmax classifier, each point’s label is assigned. Two different benchmark datasets are conducted to evaluate the performance of the proposed method, Experimental results report that the proposed LCDE-Net achieves 78.6% in terms of mean intersection over union (mIoU) on Semantic3D, and 68.2% on S3DIS, which is the highest among the comparison methods. The code of LCDE-Net is available at https://github.com/xrzmyz/LCDE-Net.
- Research Article
5
- 10.3390/rs15215248
- Nov 5, 2023
- Remote Sensing
Semantic segmentation of point clouds provided by airborne LiDAR survey in urban scenes is a great challenge. This is due to the fact that point clouds at boundaries of different types of objects are easy to be mixed and have geometric spatial similarity. In addition, the 3D descriptions of the same type of objects have different scales. To address above problems, a fusion attention convolutional network (SMAnet) was proposed in this study. The fusion attention module includes a self-attention module (SAM) and multi-head attention module (MAM). The SAM can capture feature information according to correlation of adjacent point cloud and it can distinguish the mixed point clouds with similar geometric features effectively. The MAM strengthens connections among point clouds according to different subspace features, which is beneficial for distinguishing point clouds at different scales. In feature extraction, lightweight multi-scale feature extraction layers are used to effectively utilize local information of different neighbor fields. Additionally, in order to solve the feature externalization problem and expand the network receptive field, the SoftMax-stochastic pooling (SSP) algorithm is proposed to extract global features. The ISPRS 3D Semantic Labeling Contest dataset was chosen in this study for point cloud segmentation experimentation. Results showed that the overall accuracy and average F1-score of SMAnet reach 85.7% and 75.1%, respectively. It is therefore superior to common algorithms at present. The proposed model also achieved good results on the GML(B) dataset, which proves that the model has good generalization ability.
- Research Article
26
- 10.1080/01431161.2023.2297177
- Jan 17, 2024
- International Journal of Remote Sensing
Point cloud has emerged as the most popular three-dimensional (3D) data format in recent years for several scientific and industrial applications. Point cloud semantic segmentation has piqued the researcher’s interest, which is a crucial stage in 3D analysis and scene comprehension. Deep learning-based processing is more feasible to increase the availability of point cloud acquisition tools that is LiDAR systems at the user end. The point cloud learning achieves tremendous success in object detection, object categorization, and semantic segmentation. To summarize the recent works with chronological development, comprehensive review of projection-, voxel-, and direct point-based point cloud semantic segmentation methods is performed from various perspectives. The commonly used point cloud benchmark datasets with their characteristics are discussed, and they are used for the performance analysis and comparison of several state-of-the-art segmentation methods. The quantitative performance analysis of these deep learning models summarizes the trend of semantic segmentation of point clouds. In the context of point cloud semantic segmentation, the various methods have specific roles. Based on the review of methods working and their performance analysis, it is concluded that the projection-based methods prioritize efficiency, which is ideal in unavailability of high-performance computing system. Voxel-based methods capture overall context, serving well in 3D object classification. Point-based approaches excel in fine details and efficiency, suited for tasks like 3D semantic segmentation. Choosing the suitable method depends on the task, data, and resources. KPConv and DGCNN are popular choices, especially for precision and adaptability to point density. However, method performance varies, underlining the need for tailored selection. Hybrid approaches, combining method strengths, promise superior results.
- Research Article
2
- 10.3390/agriculture15010074
- Dec 31, 2024
- Agriculture
Semantic segmentation of three-dimensional (3D) plant point clouds at the stem-leaf level is foundational and indispensable for high-throughput tomato phenotyping systems. However, existing semantic segmentation methods often suffer from issues such as low precision and slow inference speed. To address these challenges, we propose an innovative encoding-decoding structure, incorporating voxel sparse convolution (SpConv) and attention-based feature fusion (VSCAFF) to enhance semantic segmentation of the point clouds of high-resolution tomato seedling images. Tomato seedling point clouds from the Pheno4D dataset labeled into semantic classes of ‘leaf’, ‘stem’, and ‘soil’ are applied for the semantic segmentation. In order to reduce the number of parameters so as to further improve the inference speed, the SpConv module is designed to function through the residual concatenation of the skeleton convolution kernel and the regular convolution kernel. The feature fusion module based on the attention mechanism is designed by giving the corresponding attention weights to the voxel diffusion features and the point features in order to avoid the ambiguity of points with different semantics having the same characteristics caused by the diffusion module, in addition to suppressing noise. Finally, to solve model training class bias caused by the uneven distribution of point cloud classes, the composite loss function of Lovász-Softmax and weighted cross-entropy is introduced to supervise the model training and improve its performance. The results show that mIoU of VSCAFF is 86.96%, which outperformed the performance of PointNet, PointNet++, and DGCNN, respectively. IoU of VSCAFF achieves 99.63% in the soil class, 64.47% in the stem class, and 96.72% in the leaf class. The time delay of 35ms in inference speed is better than PointNet++ and DGCNN. The results demonstrate that VSCAFF has high performance and inference speed for semantic segmentation of high-resolution tomato point clouds, and can provide technical support for the high-throughput automatic phenotypic analysis of tomato plants.
- Research Article
5
- 10.1016/j.ophoto.2024.100061
- Mar 1, 2024
- ISPRS Open Journal of Photogrammetry and Remote Sensing
Real-time semantic segmentation of point clouds has increasing importance in applications related to 3D city modelling and mapping, automated inventory of forests, autonomous driving and mobile robotics. Current state-of-the-art point cloud semantic segmentation methods rely heavily on the availability of 3D laser scanning data. This is problematic in regards of low-latency, real-time applications that use data from high-precision mobile laser scanners, as those are typically 2D line scanning devices. In this study, we experiment with real-time semantic segmentation of high-density multispectral point clouds collected from 2D line scanners in urban environments using encoder - decoder convolutional neural network architectures. We introduce a rasterized multi-scan input format that can be constructed exclusively from the raw (non-georeferenced profiles) 2D laser scanner measurement stream without odometry information. In addition, we investigate the impact of multispectral data on the segmentation accuracy. The dataset used for training, validation and testing was collected with multispectral FGI AkhkaR4-DW backpack laser scanning system operating at the wavelengths of 905 nm and 1550 nm, and consists in total of 228 million points (39 583 scans). The data was divided into 13 classes that represent various targets in urban environments. The results show that the increased spatial context of the multi-scan format improves the segmentation performance on the single-wavelength lidar dataset from 45.4 mIoU (a single scan) to 62.1 mIoU (24 consecutive scans). In the multispectral point cloud experiments we achieved a 71 % and 28 % relative increase in the segmentation mIoU (43.5 mIoU) as compared to the purely single-wavelength reference experiments, in which we achieved 25.4 mIoU (905 nm) and 34.1 mIoU (1550 nm). Our findings show that it is possible to semantically segment 2D line scanner data with good results by combining consecutive scans without the need for odometry information. The results also serve as motivation for developing multispectral mobile laser scanning systems that can be used in challenging urban surveys.
- Research Article
8
- 10.5194/isprs-annals-iv-4-w8-139-2019
- Sep 23, 2019
- ISPRS Annals of the Photogrammetry, Remote Sensing and Spatial Information Sciences
Abstract. Automatic semantic segmentation of point clouds observed in a 3D complex urban scene is a challenging issue. Semantic segmentation of urban scenes based on machine learning algorithm requires appropriate features to distinguish objects from mobile terrestrial and airborne LiDAR point clouds in point level. In this paper, we propose a pointwise semantic segmentation method based on our proposed features derived from Difference of Normal and the features “directional height above” that compare height difference between a given point and neighbors in eight directions in addition to the features based on normal estimation. Random forest classifier is chosen to classify points in mobile terrestrial and airborne LiDAR point clouds. The results obtained from our experiments show that the proposed features are effective for semantic segmentation of mobile terrestrial and airborne LiDAR point clouds, especially for vegetation, building and ground classes in an airborne LiDAR point clouds in urban areas.
- Research Article
9
- 10.5194/isprs-archives-xliv-4-w1-2020-95-2020
- Sep 3, 2020
- The International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences
Abstract. Point clouds obtained via Terrestrial Laser Scanning (TLS) surveys of historical buildings are generally transformed into semantically structured 3D models with manual and time-consuming workflows. The importance of automatizing this process is widely recognized within the research community. Recently, deep neural architectures have been applied for semantic segmentation of point clouds, but few studies have evaluated them in the Cultural Heritage domain, where complex shapes and mouldings make this task challenging. In this paper, we describe our experiments with the DGCNN architecture to semantically segment historical buildings point clouds, acquired with TLS. We propose a variation of the original approach where a radius distance based technique is used instead of K-Nearest Neighbors (KNN) to represent the neighborhood of points. We show that our approach provides better results by evaluating it on two real TLS point clouds, representing two Italian historical buildings: the Ducal Palace in Urbino and the Palazzo Ferretti in Ancona.