Articles published on Learning-Based Point
Authors
Select Authors
Journals
Select Journals
Duration
Select Duration
76 Search results
Sort by Recency
- Research Article
- 10.1088/2631-8695/ae575c
- Apr 1, 2026
- Engineering Research Express
- Lifeng Yin + 3 more
Abstract Future data-driven computer-aided tool selection systems require the capability to autonomously learn from both workpiece information and quality data. To address this need, this study proposes a deep learning-based point cloud analysis method for tool selection in machining. Traditional sampling approaches often fail to accurately extract informative point cloud data, particularly when distinguishing between rough and finish-machined workpieces—such as drilling components—with only subtle geometric differences. The proposed model efficiently acquires valuable point cloud samples during preprocessing and jointly learns both local and global geometric features of workpieces. To overcome the loss of critical geometric information inherent in conventional farthest point sampling, a differentiated sampling strategy is introduced to better capture edge and cutting-surface features. Furthermore, Fourier transform-based frequency domain analysis is employed to enhance the model’s ability to represent multi-scale geometric structures. Finally, a dual attention mechanism is developed to effectively integrate multi-modal features for more robust point cloud classification. Experimental results on the IWD dataset demonstrate that the proposed method achieves an accuracy of 98.89\%, outperforming fifteen state-of-the-art baseline models.
- Research Article
- 10.1016/j.ins.2026.123072
- Apr 1, 2026
- Information Sciences
- Hanzhe Shi + 4 more
BDNet: a deep learning-based point cloud denoising network for Brassica rapa
- Research Article
- 10.54097/bwanx125
- Jan 29, 2026
- Journal of Computing and Electronic Information Management
- Ying He + 2 more
To address the limited geometric representation capability and the coarse-grained context modeling in learning-based point cloud compression for LiDAR point cloud coding, we propose a structure-aware and context-modeling point cloud compression method (SACM-PCC). On the representation learning side, we design a Structure-Aware Target Embedding module to achieve structural alignment and effective propagation of cross-scale voxel features, thereby enhancing the expression of geometric relationships from local to global. On the probabilistic modeling side, we build a progressive bitwise target occupancy predictor that adopts a conditional autoregressive strategy to decompose each 8-bit occupancy code into four sub-codes and progressively refine the probability estimation from the most significant bits to the least significant bits, improving spatial context utilization and bit-level discrimination accuracy. Experiments on the KITTI and Ford datasets show that, at comparable reconstruction quality, SACM-PCC reduces the bitrate on KITTI by approximately 57%, 21%, and 8.7% relative to Draco, G-PCCv23, and RENO, respectively, and by approximately 54%, 21.7%, and 9% on Ford. These results demonstrate that the proposed method achieves a better rate–distortion trade-off across the full bitrate range while maintaining stable geometric reconstruction performance in complex scenes.
- Research Article
- 10.1109/tip.2026.3676604
- Jan 1, 2026
- IEEE transactions on image processing : a publication of the IEEE Signal Processing Society
- Yun Zhang + 5 more
Conventional end-to-end learning-based point cloud compression requires training multiple models to adapt to different target bit rates. Moreover, the rate difference between geometry and attribute components of point clouds is not well-considered. In this paper, we propose an end-to-end Rate-Reconfigurable Deep Point Cloud Compression (RR-DPCC) with on/off-line Perceptual Bit Allocation Optimization (PBAO-ON/OFF), which achieves arbitrary bit rate control with one trained deep model and high efficiency joint geometry and attribute coding. First, we propose the framework of the RR-DPCC using PBAO-ON/OFF, which includes Point Cloud Quality Assessment (PCQA) for perceptual quality measurement, PBAO-ON/OFF modules for bit allocation and RR-DPCC for high efficiency point cloud coding. Second, we propose a one-stream network of the RR-DPCC to encode the attribute and geometry of point clouds jointly. Moreover, in RR-DPCC, a bitrate reconfigurable module is proposed to encode multiple fine-grained bitrate points with one trained model and a rate allocation module is proposed to allocate bits between geometry and attribute. Third, we propose on/off-line PBAO algorithms to maximize the perceptual quality of the reconstructed point cloud, where the bits are properly allocated based on the importance of geometry and attribute. Meanwhile, rate-distortion models (R- $\alpha $ / $\beta $ and D- $\alpha $ / $\beta $ ) are derived for high accuracy rate control and bit allocation. Experimental results show that the proposed RR-DPCC achieves fine-grained bitrate control and allocation through a single trained model. When combined the proposed RR-DPCC with PBAO-ON, it reduces -6.56% and -18.68% bit rate on average as comparing with the state-of-the-art V-PCC and Deep Joint Geometry and Attribute Compression (Deep-JGAC), respectively. When combined with the PBAO-OFF, it achieves -4.90% and -15.34% bit rate reductions on average, and reduces 98.38%/22.05% and 53.75%/10.04% encoding/decoding time on average with respect to V-PCC and Deep-JGAC.
- Research Article
3
- 10.1109/jbhi.2025.3583875
- Jan 1, 2026
- IEEE journal of biomedical and health informatics
- Zixin Yang + 5 more
In image-guided liver surgery, the initial rigid alignment between preoperative and intraoperative data, often represented as point clouds, is crucial for providing sub-surface information from preoperative CT/MRI images to the surgeon during the procedure. Currently, this alignment is typically performed using semi-automatic methods, which, while effective to some extent, are prone to errors that demand manual correction. Alternatively, correspondence-based point cloud registration methods further offer a promising fully automatic solution. However, they may struggle in scenarios with limited intraoperative surface visibility, a common challenge in liver surgery, particularly in laparoscopic procedures, which we refer to as complete-to-partial ambiguity. We first illustrate this ambiguity by evaluating the performance of state-of-the-art learning-based point cloud registration methods on our carefully constructed in silico and in vitro datasets. Then, we propose a patches-to-partial matching strategy as a plug-and-play module to resolve the ambiguity, which can be seamlessly integrated into learning-based registration methods without disrupting their end-to-end structure. This approach effectively improves registration performance, especially in low-visibility conditions, reducing registration errors to 6.7 mm ($-$29%) in silico and 12.5 mm ($-$40%) in vitro, compared to state-of-the-art performance achieved by Lepard of 9.5 mm and 20.7 mm, respectively. The constructed benchmark and the proposed module establish a solid foundation for advancing applications of point cloud correspondence-based registration methods in image-guided liver surgery. Our code and datasets will be released at https://github.com/zixinyang9109/P2P.
- Research Article
3
- 10.1109/tpami.2025.3594355
- Nov 1, 2025
- IEEE transactions on pattern analysis and machine intelligence
- Wei Gao + 5 more
With the maturity of 3D capture technology, the explosive growth of point cloud data has burdened the storage and transmission process. Traditional hybrid point cloud compression (PCC) tools relying on handcrafted priors have limited compression performance and are increasingly weak in addressing the burden induced by data growth. Recently, deep learning-based PCC methods have been introduced to continue to push the PCC performance boundary. With the thriving of deep PCC, the community urgently demands a systematic overview to conclude the past progress and present future research directions. In this paper, we have a detailed review that covers popular point cloud datasets, algorithm evolution, benchmarking analysis, and future trends. Concretely, we first introduce several widely-used PCC datasets according to their major properties. Then the algorithm evolution of existing studies on deep PCC, including lossy ones and lossless ones proposed for various point cloud types, is reviewed. Apart from academic studies, we also investigate the development of relevant international standards (i.e., MPEG standards and JPEG standards). To help have an in-depth understanding of the advance of deep PCC, we select a representative set of methods and conduct extensive experiments on multiple datasets. Comprehensive benchmarking comparisons and analysis reveal the pros and cons of previous methods. Finally, based on the profound analysis, we highlight the challenges and future trends of deep learning-based PCC, paving the way for further study.
- Research Article
- 10.1109/mcg.2025.3605266
- Sep 2, 2025
- IEEE computer graphics and applications
- Hao Yu + 4 more
In the field of digital orthodontics, dental models with complete roots are essential digital assets, particularly for visualization and treatment planning. However, intraoral scans typically capture only dental crowns, leaving roots missing. In this paper, we introduce a meticulously designed algorithmic pipeline to complete dental models while preserving crown geometry and mesh topology. Our pipeline begins with learning-based point cloud completion applied to existing dental crowns. We then reconstruct a complete tooth model, encompassing both the crown and root, to guide subsequent processing steps. Next, we restore the crown's original geometry and mesh topology using a strong Delaunay meshing structure; the correctness of this approach has been thoroughly established in existing literature. Finally, we optimize the transition region between crown and root using bi-harmonic smoothing. A key advantage of our approach is that the completed tooth model accurately maintains the geometry and mesh topology of the original crown, while also ensuring high-quality triangulation of dental roots.
- Research Article
- 10.5194/isprs-archives-xlviii-g-2025-1733-2025
- Aug 2, 2025
- The International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences
- Yidan Zhang + 5 more
Abstract. Lidar technology is widely used in the field of autonomous driving by virtue of its high precision. However, under special weather conditions such as rain, snow, fog, etc., suspended particles in the air can contaminate the point cloud data collected by LIDAR, which leads to a significant performance degradation of the vehicle sensing system and increases the driving safety risk. To address this problem, we propose A Time and Attention-Based Point Cloud Denoising Network for Autonomous Driving in Adverse Weather (TADNet). The method is based on the 3D-OutDet network with the addition of Convolutional Block Attention Module (CBAM), which highlights important features and suppresses minor ones. The original ResNet base network architecture is changed to Temporal-Bottleneck ResNet (TB-ResNet) to improve the network's ability to recognize rain, snow and fog noise. We conducted comparative experiments between the TADNet method proposed in this paper and the filter-based point cloud denoising method and the deep learning-based point cloud denoising method. The experimental results show that the denoising effect of TADNet in three kinds of bad weather, namely rain, snow and fog, is better than other methods, which can remove different kinds of noise with different intensities and retain the environmental features, and has the best performance of IoU and MIoU in all kinds of weather conditions.
- Research Article
8
- 10.1016/j.tust.2025.106605
- Aug 1, 2025
- Tunnelling and Underground Space Technology
- Xin Peng + 1 more
Deep learning-based point cloud semantic segmentation for tunnel face excavation areas in drilling and blasting tunnels
- Research Article
23
- 10.1016/j.autcon.2025.106218
- Jul 1, 2025
- Automation in Construction
- Hongzhe Yue + 3 more
Deep learning-based point cloud completion for MEP components
- Research Article
11
- 10.1016/j.plaphe.2025.100049
- Jun 1, 2025
- Plant Phenomics
- Kai Xie + 8 more
Reliable and automated three-dimensional segmentation of plant organs is essential for extracting phenotypic traits at the organ level. However, existing methods for plant organ segmentation predominantly rely on fully supervised learning, which still necessitates extensive point-by-point annotated datasets and fails to overcome the challenges associated with annotating plant point cloud data. In recent years, self-supervised learning-based point cloud segmentation methods have garnered widespread attention in both industry and academia because of their potential to alleviate the difficulties of point cloud data annotation to some extent. In this study, the paradigm of self-supervised learning is innovatively applied to the field of plant phenotyping through the development of the Plant-MAE, a self-supervised learning-based point cloud segmentation framework. The innovations of the Plant-MAE include a kernel-based point convolution embedding module and a multiangle feature extraction block (MAFEB) based on attention mechanisms. To validate the effectiveness of the model, extensive experiments were conducted on multiple point cloud datasets, which achieved competitive performance, with average precision, recall, F1 score, and IoU values of 92.08 %, 88.50 %, 89.80 %, and 84.03 %, respectively. The Plant-MAE outperforms advanced deep learning networks, including PointNet++, point transformer, and Point-M2AE, achieving average improvements of at least 0.53 %, 1.36 %, 0.88 %, and 2.38 % in precision, recall, F1 score, and IoU, respectively. Additionally, on the Pheno4D dataset, only half of the training data were necessary for fine-tuning to achieve performance comparable to that of the point transformer and PointNet++. This study provides technical support for the estimation of crop phenotypic parameters, thereby advancing the development of modern smart agriculture.
- Research Article
- 10.1145/3728309
- May 22, 2025
- Proceedings of the ACM on Computer Graphics and Interactive Techniques
- Yao Hui Fang + 1 more
Existing learning-based point cloud upsampling methods often overlook the intrinsic data distribution characteristics of point clouds, leading to suboptimal results when handling sparse and non-uniform point clouds. We propose a novel approach to point cloud upsampling by imposing constraints from the perspective of manifold distributions. Leveraging the strong fitting capability of Gaussian functions, our method employs a network to iteratively optimize Gaussian components and their weights, accurately representing local manifolds. By utilizing the probabilistic distribution properties of Gaussian functions, we construct a unified statistical manifold to impose distribution constraints on the point cloud. Experimental results on multiple datasets demonstrate that our method generates higher-quality and more uniformly distributed dense point clouds when processing sparse and non-uniform inputs, outperforming state-of-the-art point cloud upsampling techniques.
- Research Article
18
- 10.1609/aaai.v39i12.33387
- Apr 11, 2025
- Proceedings of the AAAI Conference on Artificial Intelligence
- Kangli Wang + 1 more
Learning-based point cloud compression methods have made significant progress in terms of performance. However, these methods still encounter challenges including high complexity, limited compression modes, and a lack of support for variable rate, which restrict the practical application of these methods. In order to promote the development of practical point cloud compression, we propose an efficient unified point cloud geometry compression framework, dubbed as UniPCGC. It is a lightweight framework that supports lossy compression, lossless compression, variable rate and variable complexity. First, we introduce the Uneven 8-Stage Lossless Coder (UELC) in the lossless mode, which allocates more computational complexity to groups with higher coding difficulty, and merges groups with lower coding difficulty. Second, Variable Rate and Complexity Module (VRCM) is achieved in the lossy mode through joint adoption of a rate modulation module and dynamic sparse convolution. Finally, through the dynamic combination of UELC and VRCM, we achieve lossy compression, lossless compression, variable rate and complexity within a unified framework. Compared to the previous state-of-the-art method, our method achieves a compression ratio (CR) gain of 8.1% on lossless compression, and a Bjontegaard Delta Rate (BD-Rate) gain of 14.02% on lossy compression, while also supporting variable rate and variable complexity.
- Research Article
3
- 10.3390/wevj16020080
- Feb 5, 2025
- World Electric Vehicle Journal
- Yiqi Xu + 3 more
Intelligent driving research has focused much attention on point cloud obstacles since they are a class of high-dimensional data that can adequately depict the shape and placement of obstacles, unlike picture data. Currently, deep learning technology is primarily employed for vehicle autonomy point cloud obstacle classification tasks. These techniques typically struggle with low classification accuracy, processing efficiency, and model stability. To tackle the abovementioned issues, this paper suggests a novel random forest algorithm that integrates the out-of-bag error theory and can consistently and accurately evaluate the influence of point cloud properties. Then, building on the novel algorithm, this paper suggests a modified PointNet network that incorporates the effects of both global and local features on the classification task, therefore increasing the conventional network’s classification accuracy. To assess the effectiveness of this novel approach in the experimental portion, we set up an evaluation system based on the metrics for average accuracy, overall accuracy, and a confusion matrix. According to the simulation results, the overall accuracy of the proposed network in terms of classification accuracy is 94.4% and the average accuracy is 84.9%, which are then compared to the prototype PointNet and its variants. The classification accuracies for the four types of obstacles are 97.6%, 63.6%, 92.5%, and 86.1%. In addition, the proposed method is effective at improving both the computational complexity and stability of the network.
- Research Article
5
- 10.1109/access.2025.3549316
- Jan 1, 2025
- IEEE Access
- André F R Guarda + 2 more
Efficient point cloud coding has become increasingly critical for multiple applications such as virtual reality, autonomous driving, and digital twin systems, where rich and interactive 3D data representations may functionally make the difference. Deep learning has emerged as a powerful tool in this domain, offering advanced techniques for compressing point clouds more efficiently than conventional coding methods while also allowing effective computer vision tasks performed in the compressed domain thus, for the first time, making available a common compressed visual representation effective for both man and machine. Taking advantage of this potential, JPEG has recently finalized the JPEG Pleno Learning-based Point Cloud Coding (PCC) standard offering efficient lossy coding of static point clouds, targeting both human visualization and machine processing by leveraging deep learning models for geometry and color coding. The geometry is processed directly in its original 3D form using sparse convolutional neural networks, while the color data is projected onto 2D images and encoded using the also learning-based JPEG AI standard. The goal of this paper is to provide a complete technical description of the JPEG PCC standard, along with a thorough benchmarking of its performance against the state-of-the-art, while highlighting its main strengths and weaknesses. In terms of compression performance, JPEG PCC outperforms the conventional MPEG PCC standards, especially in geometry coding, achieving significant rate reductions. Color compression performance is less competitive but this is overcome by the power of a full learning-based coding framework for both geometry and color and the associated effective compressed domain processing.
- Research Article
1
- 10.1109/tmm.2025.3542987
- Jan 1, 2025
- IEEE Transactions on Multimedia
- Lizhi Hou + 3 more
Recently, numerous learning-based point cloud compression methods with outstanding performance have been developed. The majority of them concentrate on point cloud geometry compression, and several works have demonstrated advances in the color attribute compression for dense point clouds. However, compression of the reflectance attribute attached to the point captured by the light detection and ranging (LiDAR) sensors remains a major challenge. In this article, we present a lossless reflectance compression method for LiDAR point clouds (LPCs) that learns reflectance probability distributions with a deep hierarchical k-nearest-neighbors (KNN) context model, namely, the HK-PCRC. We first represent the original LPC with a series of hierarchical layers. Relying on the hierarchical structure, points in the same layer are coded in parallel by referencing the points in the previously coded layers. The approach balances the coding efficiency and time complexity while also supporting the progressive coding functionality. By introducing the KNN context, the context size is significantly reduced, which eases the computational burden while maintaining the coding performance. To enrich the context information, we further search for enhanced neighbors for each point in the context window. For each enhanced neighbor, in addition to its reflectance value, the relative distance, elevation angle, and local density are further collected. Then, a transformer-style sequential model is applied to construct an accurate deep context model. Furthermore, to efficiently fuse context features from different sources, a cross-feature fusion attention mechanism is designed for the transformer network. The comprehensive experimental results on SemanticKITTI, a large scale LiDAR benchmark, and Ford, an MPEG-specified dataset, demonstrate that our proposed framework achieves a state-of-the-art reflectance lossless compression performance, with average bit savings of 11.3% and 9.6% when compared to the state-of-the-art hand-crafted methods.
- Research Article
1
- 10.1109/tmm.2025.3565958
- Jan 1, 2025
- IEEE Transactions on Multimedia
- Pengpeng Yu + 4 more
The growth of 3D point cloud applications requires efficient compression techniques for high-quality and low-latency services. Recently, learning-based point cloud compression models have made significant progress. However, geometric distortion resulting from downsampling limits the feature depth within large-scale point clouds, thereby constraining the receptive field and suppressing the redundant removal. Moreover, the issues of computational efficiency and reconstruction quality still persist in the compression of large-scale point clouds. To address these challenges, we propose a hierarchical distortion learning framework for end-to-end lossy compression of point clouds. First, we design a feature residual compression module to efficiently transmit shallow semantics between the encoder and the decoder, which enables a lightweight design of our framework. Second, we introduce a geometry residual compression module to progressively complement the reconstruction distortion, avoiding the accumulation of geometric distortion. By integrating these two modules and employing sufficient downsampling processes, we develop a high-performance framework with a significantly enlarged receptive field and low computational cost. Extensive experiments demonstrate that our method achieves state-ofthe- art performance in geometry lossy compression, while delivering competitive performance in joint geometry and color lossy compression with fast running speed. Code is available at <uri xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">https://github.com/pengpeng-yu/FastPCC</uri>.
- Research Article
- 10.1109/tmm.2025.3598605
- Jan 1, 2025
- IEEE Transactions on Multimedia
- Mohammadreza Ghafari + 3 more
Attention models, particularly Transformers, have significantly advanced deep learning in fields like natural language processing and computer vision by capturing contextual relationships in both sequential and spatial data. This ability is valuable for Point Clouds (PC), which are unstructured sets of points in 3D space. Transformers can effectively identify correlations between distant points, allowing them to focus on the most critical regions of the data. To demonstrate this capability, this paper proposes a novel, scalable Graph-Guided Transformer model, labeled 2GFormer, for static PC geometry. This model is built using a scalable architecture that leverages Graph Convolutions to enhance a Relational Neighborhood SelfAttention (RNSA) base layer model. Both models are integrated into the JPEG Pleno Learning-based Point Cloud Coding (JPEG PCC) standard, resulting in the creation of two attention-enabled codecs for static PC coding: JPEG RNSA and JPEG 2GFormer. While JPEG RNSA codec delivers significant compression improvements for solid and dense PCs compared to the baseline JPEG PCC standard, JPEG 2GFormer extends these gains to solid, dense, and sparse PCs with only a marginal increase in model parameters. Additionally, JPEG 2GFormer outperforms both conventional and learning-based state-of-the-art PC codecs. These results position JPEG 2GFormer as a highly efficient solution for versatile PC coding.
- Research Article
3
- 10.1109/tgrs.2025.3573023
- Jan 1, 2025
- IEEE Transactions on Geoscience and Remote Sensing
- Jingxiang Li + 6 more
Deep learning-based point cloud segmentation methods have been extensively explored, but the majority focus either on local or global feature learning, with few integrating both. These integrated approaches have not been sufficiently explored in complex mountainous scenes with low feature heterogeneity. To address this gap, we propose a novel point-based multi-scale spatial Convolution-Swin Transformer network (Point-SCT). Point-SCT combines convolutional local geometric detail capture with global relationship modeling via dynamic window interactions in Transformer, enhancing ground filtering accuracy in challenging mountainous scenes. The encoder incorporates convolution-based Multi-Scale Local Feature Aggregation (MLFA) approach, integrating Local Geometric Feature Encoding (LGSE) and Diluent Pooling (DP) strategies to effectively aggregate local detailed geometric features while suppressing irrelevant feature vectors and enhancing the representation of low-heterogeneity feature. Additionally, the dynamic spatial window strategy within the Transformer facilitates the capture of long-range feature dependencies. To mitigate noise introduced by RGB in point cloud overlays and sharpen geometric distinctions between the ground and low-lying vegetation, we introduce Boundary Detector, Curvature, and Average Elevation (BCE) as prior inputs, replacing RGB. Finally, quantitative and qualitative analyses of Point-SCT are conducted on an airborne laser scanning (ALS) dataset from a mountainous area, with ablation studies validating the effectiveness of LGSE, DP and BCE. The comprehensive experiments demonstrate that Point-SCT robustly segments ground points in complex mountainous scenes, achieving state-of-the-art levels of accuracy and generalization.
- Research Article
- 10.1109/tgrs.2025.3549492
- Jan 1, 2025
- IEEE Transactions on Geoscience and Remote Sensing
- Jinhao Lu + 8 more
With the rapid advancements of 3-D acquisition technology, 3-D change detection has gained lots of attentions recently. Existing deep learning-based point cloud change detection methods usually adopt a common encoder-decoder structure to learn pointwise features. However, these feature learning backbones are not specifically designed for change detection task, and ignore the local structure discrepancies during feature learning. To address these issues, this article proposes a multiscale difference-aware network (Ms-DANet) for 3-D point cloud change detection. First, we propose a difference-guided multiscale feature learning (DG-MsFL) module to enhance the feature differences between bi-temporal point clouds at multiple scales during feature encoding, and use these differences to guide the network focusing more on the local structures with large discrepancies. Next, we introduce a multiscale difference feature fusion (Ms-DFF) module to fuse the multiscale feature differences to learn more discriminative features during feature decoding. Finally, we treat the point cloud change detection task as a semantic classification problem, and propose a multiscale loss (Ms-Loss) function to promote the network training. We conduct experiments on the real-world street-level point cloud change detection dataset SLPCCD and the simulated airborne urban point cloud change detection dataset URB3DCD. The experimental results show that Ms-DANet obtains a significant improvement on both the real-world and simulated point cloud change detection datasets, demonstrating its effectiveness and robustness across various sensors and data modalities.