Articles published on Semantic Segmentation
Authors
Select Authors
Journals
Select Journals
Duration
Select Duration
13138 Search results
Sort by Recency
- New
- Research Article
- 10.1080/01431161.2026.2619148
- Jan 22, 2026
- International Journal of Remote Sensing
- Xiaoxuan Huang + 3 more
ABSTRACT Wildfire spread prediction is a critical task in remote sensing image analysis, where accurately identifying newly expanded wildfire regions is essential for real-time monitoring and emergency response. In this study, we propose a semantic segmentation network, termed the Residual Contextual Dual Attention Network (RCDA-Net), for predicting incremental wildfire spread regions from multi-source remote sensing data. RCDA-Net integrates two attention modules, Contextual Anchor Attention (CAA) and Adaptive Graph Channel Attention (AGCA), to enhance spatial structure modelling and inter-channel dependency learning across spectral, topographic, and meteorological inputs. We construct and release a large-scale wildfire dataset covering North America, which serves as a benchmark for incremental fire spread prediction. Experimental results on this dataset show that RCDA-Net achieves an F1-score of 0.471 and an IoU of 0.308, outperforming established models such as U-Net, AttU-Net, WPN, and FU-NetCast. With 8.9 million parameters and an inference speed of 131 frames per second (fps), RCDA-Net provides a favourable balance between segmentation accuracy and computational efficiency. Ablation studies validate the complementary effects of CAA and AGCA, while additional analyses demonstrate that Dice Loss effectively mitigates class imbalance and boundary ambiguity. Robustness evaluations further indicate that wildfire masks and meteorological factors play a dominant role in predictive performance under partial input degradation. The source code is publicly available at: https://github.com/hxxAlways/RCDA-Net.
- New
- Research Article
- 10.3390/su18021099
- Jan 21, 2026
- Sustainability
- Zhe Liu + 4 more
Historic waterfront streets are not only an important component of urban public spaces but also highlight the distinctive features and historical contexts of the city. High-quality streetscape visual perception plays a crucial role in advancing the cultural, social, environmental, and economic sustainability of the urban street space. This study was initiated to construct a multi-dimension and multi-scale comprehensive evaluation framework to assess the visual quality of waterfront streets, taking “Water City” Liaocheng as a typical case. Technical methods of semantic segmentation, sDNA (Spatial Design Network Analysis), GIS (Geographic Information System), and statistical analysis were utilized. Following the extraction and classification of street space elements, a multi-dimensional evaluation index system of natural coordination, artificial comfort, and historical culture for the visual assessment was established. Space syntax was performed on waterfront streets by sDNA to quantify macro-level scale spatial structure and meso-level scale pedestrian accessibility. The results of micro-scale visual perception, meso-scale behavioral walkability, and macro-scale spatial structure, were integrated to construct a multi-scale diagnostic framework for eight classifications. This framework provides a scientific basis to put forwards the refined and sustainable optimization strategies for historic waterfront streets.
- New
- Research Article
- 10.1109/tvcg.2026.3656737
- Jan 21, 2026
- IEEE transactions on visualization and computer graphics
- Kai Cheng + 4 more
Monocular dynamic video reconstruction is a typical ill-posed problem due to the limited observations and complex 3D motions. Despite the recent advances in dynamic 3D Gaussian splatting techniques, most of them still struggle with the monocular setting, since they heavily rely on geometric cues from multiple cameras or ignore the structural coherence among the optimized 3D Gaussains. To address this, we propose Hie4DGS, a novel hierarchical structure representation to model the complex dynamic motions from monocular dynamic videos. Specifically, we decompose the motions of a dynamic scene into groups of multiple structure granularities and progressively compose them to derive the motion of each 3D Gaussian. Building on this representation, we leverage hierarchical semantic segmentation to group Gaussians and initialize their motion using depth and tracking priors within each group. Additionally, we introduce a structure rendering loss that enforces consistency between the learned motion structure and semantic priors, further reducing motion ambiguity. Compared to the state-of-the-art dynamic Gaussian methods, we achieve significant improvement in rendering quality on monocular video datasets featuring complex real-world motions.
- New
- Research Article
- 10.1007/s11370-025-00689-9
- Jan 20, 2026
- Intelligent Service Robotics
- Sol Han + 3 more
Abstract In recent years, maritime autonomous surface vessels (ASVs) have garnered increasing attention due to advancements in sensor technologies and deep learning. A key challenge in developing safe maritime vessel navigation is the ability to perceive the dynamic maritime environment, which involves tasks such as obstacle detection, recognition of navigational elements, and situational awareness. This paper reviews a range of studies on maritime datasets and perception algorithms, many of which present their own datasets alongside perception models. The datasets are organized based on sensor configurations and task objectives, as well as the algorithms used for tasks such as object detection, semantic segmentation, target tracking, multimodal sensor fusion, and simultaneous localization and mapping (SLAM). This survey provides a comprehensive overview of recent advancements in maritime perception from a robotics perspective and offers valuable insights to guide future research toward the development of safe and reliable autonomous ship navigation systems.
- New
- Research Article
- 10.1088/2631-8695/ae3a3e
- Jan 19, 2026
- Engineering Research Express
- Yuxin Qin + 5 more
Abstract This paper addresses the problems of unstable feature extraction, matching failure, and trajectory drift in traditional visual SLAM systems under complex lighting and dynamic environments. A semantic edge-constrained visual localization system based on MSRCR image enhancement, termed MYSC-SLAM, is proposed. In the front-end, an improved MSRCR algorithm is employed to achieve adaptive brightness equalization and multi-scale image enhancement, which significantly improves the texture details and feature visibility in low-light and high-contrast regions. Simultaneously, the YOLOv8-seg detection model is introduced to generate masks for dynamic objects, and combined with semantic segmentation results to perform semantic-guided edge extraction, effectively suppressing mismatches caused by dynamic targets. In the back-end optimization stage, a feature weighting strategy based on semantic category constraints is adopted to assign differentiated weights to feature points from various semantic regions, allowing static structural features to dominate the pose optimization process, thereby improving the overall stability and accuracy of trajectory estimation. Experimental results demonstrate that MYSC-SLAM achieves superior localization accuracy and robustness compared with ORB-SLAM3 and DynaSLAM on public datasets such as TUM, EuRoC, and OIVIO, verifying the effectiveness and practical value of the proposed method in complex illumination and dynamic scenarios.
- New
- Research Article
- 10.1080/1448837x.2026.2616567
- Jan 19, 2026
- Australian Journal of Electrical and Electronics Engineering
- Honghui Xie + 3 more
ABSTRACT In recent years, domain-specific dialogue systems have shown strong potential for industrial use, but many existing models struggle with professional accuracy and deep contextual understanding, particularly in knowledge-intensive fields such as the new energy industry. To address this limitation, we propose a domain-oriented dialogue generation and evaluation framework that integrates expert knowledge modelling with large pre-trained language models. First, a domain-specific knowledge graph is constructed from technical standards, policy documents, and expert reports, with entities and relationships embedded using graph representation methods such as TransE and GraphSAGE. We then develop a dialogue generation module based on LLaMA and ChatGLM, incorporating domain terminology adaptation and knowledge injection to improve response accuracy. A hybrid strategy combining semantic segmentation, dialogue state tracking, and context-aware memory networks is introduced to support robust multi-turn dialogue management. Furthermore, a multi-dimensional evaluation framework is designed, including automatic metrics (BLEU, BERTScore, ProMatch, and knowledge recall) and human evaluation criteria (fluency, accuracy, and domain relevance). Experiments on two curated datasets demonstrate clear improvements over baseline methods, achieving a 5.2% F1 gain in named entity recognition and reducing manual document processing time by more than 80%.
- New
- Research Article
- 10.1038/s41598-026-36445-x
- Jan 18, 2026
- Scientific reports
- Jiyan Zhang + 5 more
Few-shot semantic segmentation has gained significant attention in metal surface defect detection due to its ability to segment unseen object classes with only a few annotated defect samples. Previous methods constrained to single-episode training suffer from limited adaptability in semantic description of defect regions and coarse segmentation granularity. In this paper, we propose an episode-adaptive memory network (EAMNet) that specifically addresses subtle variances between episodes during training. The episode adaptive memory unit (EAMU) leverages an adaptive factor to model semantic dependencies across different episodes. The context adaptation module (CAM) aggregates hierarchical features of support-query pairs for fine-grained segmentation. The proposed global response mask average pooling (GRMAP) introduces a global response normalization to obtain fine-grained cues directly from the support prototype. We also introduce an attention distillation (AD), which leverages fine-grained semantic attention correspondence to process defect region cues and stabilize the cross-episode adaptation in EAMU. Extensive experiments demonstrate that our approach establishes new state-of-the-art performance on both Surface Defect-[Formula: see text] and FSSD-12 datasets.
- New
- Research Article
- 10.1038/s41598-025-29931-1
- Jan 17, 2026
- Scientific Reports
- Cristina Cărunta + 3 more
Real-time semantic segmentation of driving scenes via effective attention-based information fusion and hybrid encoder
- New
- Research Article
- 10.1515/cppm-2025-0287
- Jan 15, 2026
- Chemical Product and Process Modeling
- Yancheng Li + 2 more
Abstract This paper presents a thorough description of a highly dependable object recognition model for industrial applications in difficult environments based on a combination of stereo vision, semantic segmentation and a hybrid CNN–V3 architecture; The pipeline performs pixel-level segmentation using DeepLabV3+, extracts RGB-D features with VGG19, and final classification is achieved through a concise CNN–ViT fusion module. Solid semantic segmentation and stereo depth-aware modelling are adaptable and effective solutions to perennial challenges in industrial recognition such as occlusion, lighting fluctuations, and complex backdrops that can undermine performance in industry. The proposed pipeline has the potential to enhance depth-based perception and spatial reasoning for identification and interaction with industrial objects in at least three modes: recognition, cognition, and classification. The model was evaluated through an experiment based on the XYZ-IBD dataset revealing a total accuracy of 96.54 %, F1 score of 0.956 and AUC of 0.996 demonstrably indicating significant advantage over existing 3D deep learning-based recognition models and those based on binocular images. The combined semantic segmentation and stereo depth approach offers a robust architecture that enhances perception and accuracy for Industry 4.0–driven industrial robotic applications. Performance gains were confirmed by comparing the model with baseline approaches such as the Bilateral Vision-Aided Transformer Network and binocular Mask R–CNN, where it achieved higher accuracy, F1-score, and AUC. The framework also introduces a compact RGB-D fusion design and hybrid CNN–ViT architecture that improves robustness and recognition reliability in complex industrial settings.
- New
- Research Article
- 10.3390/app16020840
- Jan 14, 2026
- Applied Sciences
- Huiwen Dong + 1 more
Semantic segmentation of laparoscopic images requires costly pixel-level annotations, which are often unavailable for real surgical data. This gives rise to an unsupervised domain adaptation scenario, where labeled synthetic images serve as the source domain and unlabeled real images as the target. We propose a frequency-aware unsupervised domain adaptation framework to mitigate the domain gap between simulated and real laparoscopic images. Specifically, we introduce a Radial Frequency Masking module that selectively masks frequency components of real images, and employ a Mean Teacher framework to enforce consistency between high- and low-frequency representations. In addition, we propose a module called Fourier Domain Adaptation-Blend, a style transfer strategy based on low-frequency blending, and apply entropy minimization to enhance prediction confidence on the target domain. Experiments are conducted on public datasets by jointly training on simulated and real laparoscopic images. Our method consistently outperforms representative baselines. These results demonstrate the effectiveness of frequency-aware adaptation in surgical image segmentation without relying on manual annotations from the target domain.
- New
- Research Article
- 10.1038/s40494-025-02292-8
- Jan 13, 2026
- npj Heritage Science
- Rui Liu + 5 more
Learning discriminative universal background knowledge for few-shot point cloud semantic segmentation of architectural cultural heritage
- New
- Research Article
- 10.3389/fpls.2025.1727626
- Jan 13, 2026
- Frontiers in Plant Science
- Dejing Zhou + 10 more
Introduction Pine wilt disease (PWD) is a highly destructive infectious disease that severely damages pine forests worldwide. Because symptoms emerge first in the tree crown, detection from unmanned aerial vehicles (UAVs) is efficient. However, most methods perform only binary classification and lack pixel-level staging, which leads to missed initial symptoms and confusion with other species. Methods We propose MSCF-LUNet, a lightweight three-stage semantic segmentation model based on multi-scale context fusion. The model uses an improved multi-scale patch embedding guided by attention with relative position encoding (AWRP) to adapt the sampling field of view and to fuse local details with global context. Under contextual attention, the network learns fine-grained features and location cues. Results In complex environments, MSCF-LUNet achieves 89.56% precision, 92.13% recall, 88.92% intersection over union (IoU), and 96.54% pixel accuracy (PA), balancing performance and computational cost. Discussion The model effectively segments PWD-infected regions and determines disease stages from remote-sensing imagery.
- New
- Research Article
- 10.3390/s26020531
- Jan 13, 2026
- Sensors
- Boxu Li + 2 more
With rapid advancements in sub-meter satellite and aerial imaging technologies, high-resolution remote sensing imagery has become a pivotal source for geospatial information acquisition. However, current semantic segmentation models encounter two primary challenges: (1) the inherent trade-off between capturing long-range global context and preserving precise local structural details—where excessive reliance on downsampled deep semantics often results in blurred boundaries and the loss of small objects and (2) the difficulty in modeling complex scenes with extreme scale variations, where objects of the same category exhibit drastically different morphological features. To address these issues, this paper introduces MAFMamba, a multi-scale adaptive fusion visual Mamba network tailored for high-resolution remote sensing images. To mitigate scale variation, we design a lightweight hybrid encoder incorporating an Adaptive Multi-scale Mamba Block (AMMB) in each stage. Driven by a Multi-scale Adaptive Fusion (MSAF) mechanism, the AMMB dynamically generates pixel-level weights to recalibrate cross-level features, establishing a robust multi-scale representation. Simultaneously, to strictly balance local details and global semantics, we introduce a Global–Local Feature Enhancement Mamba (GLMamba) in the decoder. This module synergistically integrates local fine-grained features extracted by convolutions with global long-range dependencies modeled by the Visual State Space (VSS) layer. Furthermore, we propose a Multi-Scale Cross-Attention Fusion (MSCAF) module to bridge the semantic gap between the encoder’s shallow details and the decoder’s high-level semantics via an efficient cross-attention mechanism. Extensive experiments on the ISPRS Potsdam and Vaihingen datasets demonstrate that MAFMamba surpasses state-of-the-art Convolutional Neural Network (CNN), Transformer, and Mamba-based methods in terms of mIoU and mF1 scores. Notably, it achieves superior accuracy while maintaining linear computational complexity and low memory usage, underscoring its efficiency in complex remote sensing scenarios.
- New
- Research Article
- 10.1186/s13007-025-01495-1
- Jan 13, 2026
- Plant methods
- Yiding Zhang + 5 more
Microscopic imaging provides essential visual evidence for pathogen monitoring, but its shallow depth of field and the three-dimensional height variation of spores lead to pronounced defocus blur and structural degradation. Restoring such images is therefore crucial for reliable spore identification and downstream analysis. However, microscopic defocus is a spatially varying process that severely suppresses high-frequency structures, causing natural-image deblurring models to generalize poorly. In addition, optical constraints of microscopy make realistic sharp-blur pairs difficult to obtain, further limiting learning-based restoration. To address these challenges, we propose MicroDeblurNet, a single-image deblurring network specifically designed for microscopic defocus restoration. The model incorporates a convolutional block attention module to enhance spatial selectivity toward key pathogen structures, and employs depthwise over-parameterized convolutions to capture locally varying blur patterns more effectively, enabling spatially consistent and structurally coherent restoration. Furthermore, a spatial-frequency consistency loss is proposed to strengthen high-frequency detail recovery while maintaining color fidelity and morphological integrity. To support high-fidelity supervision, we propose a paired-data construction strategy based on Laplacian-pyramid fusion and construct a clear-blur microscopic dataset for cucumber downy mildew. The restored outputs of MicroDeblurNet are further applied to sporangia detection and semantic segmentation to evaluate their impact on high-level visual tasks. Finally, we build an integrated microscopic analysis platform that delivers standardized high-quality data and automated pathogen-structure recognition and analysis to support disease assessment and management. Experimental results demonstrate that MicroDeblurNet achieves an optimal balance across pixel-level, structure-level, and perception-level metrics, reaching a PSNR of 42.48 dB and SSIM of 0.9839, outperforming advanced state-of-the-art methods. In downstream tasks, MicroDeblurNet delivers higher detection recall and segmentation accuracy in challenging scenarios involving sporangia adhesion and background impurities, demonstrating its ability to enhance target discernibility, preserve structural completeness, and improve robustness under complex microscopic conditions.
- New
- Research Article
- 10.3390/data11010016
- Jan 12, 2026
- Data
- Yiquan Zou + 4 more
In intelligent construction and BIM–Reality integration applications, high-quality, large-scale construction scene point cloud data with component-level semantic annotations constitute a fundamental basis for three-dimensional semantic understanding and automated analysis. However, point clouds acquired from real construction sites commonly suffer from high labeling costs, severe occlusion, and unstable data distributions. Existing public datasets remain insufficient in terms of scale, component coverage, and annotation consistency, limiting their suitability for data-driven approaches. To address these challenges, this paper constructs and releases a BIM-derived synthetic construction scene point cloud dataset, termed the Synthetic Point Cloud (SPC), targeting component-level point cloud semantic segmentation and related research tasks.The dataset is generated from publicly available BIM models through physics-based virtual LiDAR scanning, producing multi-view and multi-density three-dimensional point clouds while automatically inheriting component-level semantic labels from BIM without any manual intervention. The SPC dataset comprises 132 virtual scanning scenes, with an overall scale of approximately 8.75×109 points, covering typical construction components such as walls, columns, beams, and slabs. By systematically configuring scanning viewpoints, sampling densities, and occlusion conditions, the dataset introduces rich geometric and spatial distribution diversity. This paper presents a comprehensive description of the SPC data generation pipeline, semantic mapping strategy, virtual scanning configurations, and data organization scheme, followed by statistical analysis and technical validation in terms of point cloud scale evolution, spatial coverage characteristics, and component-wise semantic distributions. Furthermore, baseline experiments on component-level point cloud semantic segmentation are provided. The results demonstrate that models trained solely on the SPC dataset can achieve stable and engineering-meaningful component-level predictions on real construction point clouds, validating the dataset’s usability in virtual-to-real research scenarios. As a scalable and reproducible BIM-derived point cloud resource, the SPC dataset offers a unified data foundation and experimental support for research on construction scene point cloud semantic segmentation, virtual-to-real transfer learning, scan-to-BIM updating, and intelligent construction monitoring.
- New
- Research Article
- 10.3390/metrology6010004
- Jan 12, 2026
- Metrology
- Bo Shi + 2 more
To achieve low-cost and flexible wheel angles measurement, we propose a novel strategy that integrates wheel segmentation network with 3D vision. In this framework, a semantic segmentation network is first employed to extract the wheel rim, followed by angle estimation through ICP-based point cloud registration. Since wheel rim extraction is closely tied to angle computation accuracy, we introduce APCS-SwinUnet, a segmentation network built on the SwinUnet architecture and enhanced with ASPP, CBAM, and a hybrid loss function. Compared with traditional image processing methods in wheel alignment, APCS-SwinUnet delivers more accurate and refined segmentation, especially at wheel boundaries. Moreover, it demonstrates strong adaptability across diverse tire types and lighting conditions. Based on the segmented mask, the wheel rim point cloud is extracted, and an iterative closest point algorithm is then employed to register the target point cloud with a reference one. Taking the zero-angle condition as the reference, the rotation and translation matrices are obtained through point cloud registration. These matrices are subsequently converted into toe and camber angles via matrix-to-angle transformation. Experimental results verify that the proposed solution enables accurate angle measurement in a cost-effective, simple, and flexible manner. Furthermore, repeated experiments further validate its robustness and stability.
- New
- Research Article
- 10.1093/ehjdh/ztaf143.108
- Jan 12, 2026
- European Heart Journal. Digital Health
- M Tokodi + 11 more
BackgroundRight ventricular (RV) function represents an important predictor of morbidity and mortality in various cardiovascular conditions. Nevertheless, its 2D echocardiographic assessment is challenging due to its complex anatomy and location in the chest, resulting in limited inter-observer reproducibility.PurposeWe aimed to develop a novel deep learning model, EchoNet-RV, to segment the RV in apical 4-chamber view (A4C) echocardiographic videos and estimate RV fractional area change (RVFAC).MethodsFor training EchoNet-RV, 7,169 expert-annotated A4C echocardiographic videos were used. EchoNet-RV comprises two major components: one based on an R(2+1)D-18 architecture for spatiotemporal convolution and another based on a DeepLabV3+ architecture with a ResNet-50 backbone for semantic segmentation. The outputs of these two components are then combined to create beat-to-beat predictions of the RVFAC. The model’s performance was evaluated on a hold-out test set of 1,320 A4C videos and two international external test sets of 3,107 and 1,077 A4C videos from two separate centers. Additionally, the associations between the predicted RVFAC values and the composite endpoint of heart failure hospitalization and all-cause death were also analyzed in the first external test set.ResultsEchoNet-RV segmented the RV with Dice coefficients of 0.893 (95% CI: 0.891–0.895), 0.797 (95% CI: 0.796–0.798), and 0.788 (95% CI: 0.785–0.790) and predicted RVFAC with mean absolute errors of 5.795 (95% CI: 5.560–6.031), 5.830 (95% CI: 5.692–5.970), and 6.363 (95% CI: 6.114–6.611) percentage points in the held-out test set and the two external test sets, respectively. In a randomly selected subset of the external test sets (n=500), EchoNet-RV’s prediction error was significantly lower than inter-observer variability (mean absolute difference: 6.126 (95% CI: 5.735–6.563) vs. 9.699 (95% CI: 9.031–10.458) percentage points, p<0.001). Moreover, it identified RVFAC <35% with areas under the receiver operating characteristic curve of 0.859 (95% CI: 0.843–0.876), 0.725 (95% CI: 0.710–0.740), and 0.684 (95% CI: 0.653–0.713) in the three test sets. In the first external test set, predicted RVFAC values were inversely associated with the composite endpoint of heart failure hospitalization and all-cause death (adjusted HR: 0.948 [95% CI: 0.917–0.979], p<0.001), independent of age, sex, cardiovascular risk factors, and left ventricular systolic function.ConclusionEchoNet-RV enables the rapid and automated assessment of RVFAC, with strong potential to become a valuable tool for the echocardiographic evaluation of RV function and disease surveillance.
- New
- Research Article
- 10.1080/01431161.2025.2605793
- Jan 11, 2026
- International Journal of Remote Sensing
- Kai Cui + 8 more
ABSTRACT Rammed earth sites are vital heritage structures with considerable historical and cultural value. However, they face severe degradation due to natural erosion and human activities in arid Northwest China. Current methods for their extraction and monitoring remain limited, particularly in integrating multi-source remote sensing with advanced machine learning for precise localization and predictive analysis. In this study, we aimed to address these gaps by developing an accurate boundary extraction framework for rammed earth sites using multi-source remote-sensing data, and simulating their future spatiotemporal evolution to support proactive conservation. We compared three machine learning approaches: Object-Based Image Analysis combined with Convolutional Neural Networks (OBIA-CNN), Maximum Entropy Model-based Discrete Particle Swarm Optimization (MEDPSO), and U-Net-based semantic segmentation. OBIA-CNN outperforms MEDPSO and U-Net, achieving superior accuracy (OA = 97.46%, Kappa = 0.95) with strong anti-interference and generalization capabilities, effectively minimizing salt-and-pepper noise and preserving structural continuity. While achieving a high recall (0.9731), U-Net exhibited boundary expansion and over-segmentation, limiting its precision in delineating fine archaeological features. We applied the Markov-PLUS model to simulate land-use changes around four representative sites from 2023 to 2056 under natural scenarios, incorporating environmental and socioeconomic drivers. The model indicated critical transitions in land cover that threaten site preservation, enabling the identification of high-risk zones. This study provides an integrated framework that bridges high-precision site extraction with spatiotemporal simulation, offering a scientific basis for the sustainable conservation of rammed earth heritage in arid environments.
- New
- Research Article
- 10.1038/s41598-026-35723-y
- Jan 11, 2026
- Scientific reports
- Wansong Zhang + 5 more
Semantic segmentation of remote sensing images has important application value in fields such as farmland anomaly detection and urban planning. However, the low-level features extracted by deep neural network models retain rich spatial detail information while introducing redundancy and noise. The significant differences in the semantic level and spatial distribution of high-level and low-level features pose challenges to their effective fusion. To this end, we propose a Multi-Feature Enhancement Fusion Network that improves local feature expression and global semantic modelling ability by fusing edge information and semantic information. The Edge Enhancement Module used traditional edge detection operators to enhance the details of edge features. The Multi-Feature Fusion Module effectively integrates semantic and edge features to enhance the ability to express fine-grained information. The Local-Global Feature Enhancement Module hierarchically establishes local details and global context information, and the Multi-Level Fusion segmentation head integrates the features of different levels to utilise both shallow spatial details and deep semantic information fully. Following this, our extensive experiments on three publicly available datasets demonstrate that the proposed model outperforms state-of-the-art methods. The code will be published on: https://github.com/zwsbh/MFEF.
- New
- Research Article
- 10.62051/ijcsit.v8n1.06
- Jan 11, 2026
- International Journal of Computer Science and Information Technology
- Xiujuan Liang + 1 more
As a key task in 3D scene understanding, point cloud semantic segmentation has broad application prospects in fields such as autonomous driving and robot navigation. Existing point cloud segmentation methods suffer from insufficient local feature extraction and a lack of effective integration of global contextual information, leading to inaccurate recogni-tion and incomplete segmentation of categories with similar surface textures and geometric structures. In view of this, this paper proposes an improved point cloud segmentation method for RandLA-Net : (1) Local polar coordinate posi-tion encoding module is introduced to eliminate the impact of Z-axis rotation on feature learning; (2) Global information acquisition module composed of attention mechanisms is constructed to enhance the network's contextual perception ability; (3) Hybrid pooling mechanism is integrated to improve the extraction of local features. The proposed method is evaluated on the self-built HPU dataset and public datasets S3DIS and Toronto-3D. The results show that the improved network achieves mean intersection over union (mIoU) values of 90.7%, 71.2%, and 76.4% respectively, demonstrating improvements compared to other algorithms. The model exhibits excellent generalization and segmentation perfor-mance in different types of point cloud scenes.