Articles published on Segmentation Of Regions
Authors
Select Authors
Journals
Select Journals
Duration
Select Duration
2543 Search results
Sort by Recency
- New
- Research Article
- 10.1016/j.neunet.2025.108509
- May 1, 2026
- Neural networks : the official journal of the International Neural Network Society
- Yichen Xiao + 14 more
Accurate segmentation of optic disc regions in fundus images is essential for the diagnosis and monitoring ocular fundus pathology of ophthalmic diseases such as myopia, glaucoma, and diabetic retinopathy. However, challenges such as irregular disc shapes, edge detail preservation, and limited annotated datasets hinder the effectiveness of existing methods. To handle these problems, we propose FDAU-Net, an efficient, fully convolutional network for optic disc segmentation. FDAU-Net integrates the Flexible Wavelet Convolution Block (FWC Block), which combines the dynamic receptive field adjustment of AKConv with wavelet down-sampling to achieve efficient feature extraction while retaining critical edge information. Additionally, we design the Selective Dual-Attention Fusion Gate (SDA Gate) to optimize feature selection and fusion, leveraging channel and spatial attention mechanisms to substantially enhance segmentation accuracy and robustness. To further improve performance, we introduce MedAugment, an automated data augmentation method tailored for medical images, efficiently boosts data diversity in scenarios with low datasets. Experimental results on the iChallenge and IDRiD datasets demonstrate that FDAU-Net consistently achieves superior segmentation performance across key metrics, highlighting its potential for advancing clinical applications.
- New
- Research Article
- 10.1016/j.bspc.2026.109498
- May 1, 2026
- Biomedical Signal Processing and Control
- Jiahuan Lin + 3 more
BT-MCT: Brain tumor regions segmentation using multi-class token transformers for weakly supervised semantic segmentation
- New
- Research Article
- 10.1177/08953996261439080
- Apr 13, 2026
- Journal of X-ray science and technology
- Jing Zhou + 4 more
Accurate segmentation of cesarean scar disorder (CSDi) in ultrasound images is crucial for clinical diagnosis, disease monitoring, and personalized treatment. However, the ambiguous boundaries and complex anatomical structures of CSDi pose significant challenges. To address this, we propose UMamba-Dual, a dual-branch model derived from UMamba, designed to enhance segmentation performance in CSDi regions and provide reliable imaging support for clinical decision-making. UMamba-Dual integrates the strengths of two enhanced branches: Dual-Bot, incorporating squeeze-and-excitation (SE) attention, and Dual-Enc, employing a feature pyramid network (FPN) for improved feature representation and multi-scale perception. The training dataset included 1200 augmented 2D ultrasound images from 300 originals via flipping and rotation. An independent test set of 32 images was randomly selected and excluded from training and validation. Model performance was evaluated using Dice similarity coefficient, Intersection over Union (IoU), and Normalized Surface Dice (NSD), and compared with classical segmentation models such as nnUNet and VM-UNet. UMamba-Dual achieved superior performance with Dice = 0.832, IoU = 0.782, and NSD = 0.788, consistently outperforming both classical models (UNet, nnUNet) and the recent VM-UNet, as well as the internal baselines (UMamba_Bot, UMamba_Enc). UMamba-Dual enables more accurate and robust segmentation of CSDi regions in ultrasound images, particularly in cases characterized by ambiguous boundaries or irregular anatomical structures. These results highlight its potential for reliable clinical application.
- New
- Research Article
- 10.3389/fmed.2026.1755311
- Apr 13, 2026
- Frontiers in Medicine
- Yuchun Li + 3 more
The annual mortality rate from prostate cancer (PCa), a common malignant neoplasm affecting middle-aged and elderly men, is on the rise. Biparametric magnetic resonance imaging (bpMRI) is indispensable to PCa imaging analysis since it can capture distinct disease-related information from two modalities that exhibit synergistic performance. The majority of state-of-the-art PCa diagnostic techniques currently available are focus on a single modality or task, neglecting the information sharing across the two modalities and task correlations inherent in multi-task learning. We provide a dual-modality image fusion and multi-task learning model that can accomplish both automatic PI-RADS grading and prostate and PCa region segmentation simultaneously. First, to extract complementary information between the prostate and PCa in bimodal images via T2-weighted imaging (T2WI) and diffusion-weighted imaging (DWI) feature extraction, a shared block fusion module and an independent encoder block were developed; Subsequently, in the encoder stage, the dual visual attention module was designed to extract features from multiple receptive field and deliver more accurate contextual information, and a novel decoder was designed to effectively integrate encoder features, yielding more refined global and local detail information; Next, to capture more precise detail information during the classification task stage, a high-level feature fusion technique was developed; To address class imbalance, a multitask mixed loss function is finally suggested. The segmentation results of prostate and PCa on multiple diverse male pelvic MRI datasets demonstrate the superior performance of our proposed method. Both the basic performance evaluation and comparative model evaluation of the proposed model have validated its effectiveness in prostate and PCa segmentation as well as PI-RADS automatic grading. External validation on the independent PROMISE12 dataset further confirms the strong generalizability of our model across different institutions, scanning devices and patient cohorts.
- Research Article
- 10.1016/j.radi.2026.103403
- Apr 3, 2026
- Radiography (London, England : 1995)
- A Ayadi + 2 more
Lung cancer segmentation Using the Att-U-Net Model on PET-CT Images.
- Research Article
- 10.1158/1538-7445.am2026-6676
- Apr 3, 2026
- Cancer Research
- Nadine Nelson + 5 more
Abstract Glioblastoma (GBM) is a highly aggressive and spatially heterogeneous brain tumor with limited treatment options and poor prognosis. Effective therapeutic targeting requires a deeper understanding of the tumor microenvironment (TME), particularly the spatial relationships between malignant, immune, and stromal compartments. To address this, we employed the Cell DIVE™ multiplex immunofluorescence platform to perform high-plex spatial proteomic profiling of FFPE glioblastoma tissue section. A customized panel of directly conjugated recombinant antibodies from Abcam was optimized to interrogate markers relevant to tumor biology, immune modulation, and stromal architecture. Following iterative staining and imaging cycles, multi-channel, high-resolution images were analyzed using Aivia, an AI-powered image analysis platform. The workflow enabled accurate segmentation of tissue regions and quantification of spatial protein expression patterns across distinct anatomical zones within GBM. Spatial relationships among protein expression patterns revealed region-specific immune infiltration and phenotypic transitions at tumor-stromal interfaces. Co-localization and proximity analysis further identified potential immune evasion signatures and microenvironmental structures associated with resistance phenotypes. This integrative approach provides a scalable and clinically compatible method for spatially resolved proteomic analysis of complex tumor tissues. The combination of Cell DIVE™ multiplex imaging, Abcam direct-conjugate antibodies, and Aivia-based analysis offers a powerful platform for dissecting the spatial biology of glioblastoma. The translational relevance of this study lies in its potential to inform biomarker development, guide spatially targeted therapies, and support precision medicine strategies in GBM. This methodology is well positioned for integration into clinical research workflows and large-scale translational studies, with potential to inform personalized treatment strategies across neuro-oncology and other solid tumors. Citation Format: Nadine Nelson, Hoyin Lai, Sophie Struble, Richard A. Heil-Chapdelaine, Natasha F. Diaz Granados, Arindam Bose. Spatial proteomics and AI-driven analysis uncover therapeutic landscapes within the glioblastoma microenvironment [abstract]. In: Proceedings of the American Association for Cancer Research Annual Meeting 2026; Part 1 (Regular Abstracts); 2026 Apr 17-22; San Diego, CA. Philadelphia (PA): AACR; Cancer Res 2026;86(7 Suppl):Abstract nr 6676.
- Research Article
- 10.3390/foods15071211
- Apr 2, 2026
- Foods (Basel, Switzerland)
- Qian Yan + 6 more
To address the low accuracy of traditional freshness detection/grading and poor adaptability to different shell colors in the duck egg industry, this study developed a non-destructive detection model and an integrated device for duck egg freshness based on machine vision combined with eggshell optical property analysis. A four-sided yolk transmission imaging system was designed, and accurate yolk region segmentation was achieved via grayscale conversion, a weighted improved Otsu algorithm for whole-egg segmentation, histogram equalization enhancement, and K-means clustering in the LAB color space. A relational model between the average four-angle yolk projected area ratio and Haugh Units (HU) freshness grades was constructed, with grading thresholds determined by constrained optimization combined with the Youden index to balance food safety and grading accuracy. Experimental results showed the model achieved an overall freshness grade discrimination accuracy of 91.3%, with a sensitivity of 97.1% and specificity of 98.9% for inedible Grade B (HU < 60) duck eggs and below. An automated testing device was further developed, adopting a roller-rotating motor collaborative mechanism for automatic flipping and imaging, and equipped with a 10 W/5500 K LED cool white light source to solve the problem of poor adaptability to different shell colors. The device achieved an overall discrimination accuracy of 88.5% with a detection time of ≤5 s per egg, and its host computer can real-time output the yolk area ratio, predicted HU value, and freshness level. This study provides a high-precision and low-cost technical solution for the refined grading of the poultry egg industry.
- Research Article
- 10.1016/j.rsase.2026.102018
- Apr 1, 2026
- Remote Sensing Applications: Society and Environment
- Jabraeil Ranjbar + 2 more
SWUFlood: Segmentation of Flood-affected Regions in UAV Imagery Using an Enhanced Hybrid Approach Integrating Shifted Window Transformers with U-Net
- Research Article
- 10.1016/j.neuroimage.2026.121918
- Apr 1, 2026
- NeuroImage
- Marcel Daamen + 11 more
Long-term alterations of cerebellar structure after premature birth.
- Research Article
- 10.21037/qims-2025-1946
- Apr 1, 2026
- Quantitative imaging in medicine and surgery
- Weiye Cao + 4 more
Dermoscopic image segmentation plays a crucial role in computer-aided diagnostic (CAD) systems as an important tool for diagnosing skin lesions. The precise segmentation of lesion regions in dermatological images facilitates more objective and accurate diagnostic decision-making in clinical practice. However, many existing deep learning models face challenges in accurately segmenting lesion edges and are computationally complex, which limits their deployment on resource-constrained edge devices. Therefore, this study aimed to design a dermoscopic image segmentation model that simultaneously addresses the challenges of edge segmentation and maintains low computational complexity. We developed a wide edge-assisted lightweight dermoscopic image segmentation network (WENet) consisting of a lightweight encoder and a wide-edge-assisted decoder. The encoder is constructed using a squeeze dual-path convolution (SDPC), which adopts a bottleneck design and employs asymmetric convolutions with large and small dilation rates to significantly reduce model complexity while ensuring efficient feature extraction. It also integrates the statistical multi-feature adaptive channel recalibration attention (SACA) module for precise channel feature recalibration. The decoder consists of the wide-boundary generator (WBG), prediction information fusion decoding layer (PFDL), and progressive multi-scale feature fusion segmentation head (PMSSH). The WBG generates wide-edge labels by combining ground-truth annotations with morphological erosion and applies deep supervision to guide the model to learn boundary features, enhancing the model's edge segmentation performance without increasing the parameter count. The PFDL fuses region and boundary predictions with decoding features and employs a grouped design using the SDPC for feature extraction. It then enhances spatial feature information using group multi-axis Hadamard product attention (GHPA). The PMSSH progressively integrates multi-scale features to bridge the semantic gap across scales, ultimately producing the final segmentation map. WENet was evaluated on the International Skin Imaging Collaboration (ISIC)2017, ISIC2018, and Pedro Hispano 2 (PH2) datasets, and achieved mean intersection over union (mIoU) scores of 80.37%, 81.34%, and 85.98%, and specificity (Spe) values of 98.37%, 97.39%, and 96.23%, respectively, while maintaining a model size under 15KB. The model has significantly fewer parameters compared to recent state-of-the-art models, while maintaining excellent segmentation performance. The proposed WENet presents an accurate yet computationally efficient solution for dermoscopic image segmentation, outperforming state-of-the-art methods in both model compactness and boundary segmentation precision.
- Research Article
- 10.1016/j.imavis.2026.105940
- Apr 1, 2026
- Image and Vision Computing
- Nishtha Tomar + 2 more
Attention inspired adversarial network for intratumoral brain regions segmentation
- Research Article
- 10.3390/s26072129
- Mar 30, 2026
- Sensors (Basel, Switzerland)
- Xiaolong Wei + 6 more
In view of the bottleneck problems existing in the 3D ultrasonic testing of aircraft composite laminated structures-including heavy reliance on manual operation, resulting in low detection efficiency, and the inability of traditional robotic arms to adapt to the testing of complex curved surfaces due to their dependence on predefined fixed trajectories-this paper proposes an automated 3D ultrasonic testing method based on 3D vision guidance for robotic arms. Firstly, the proposed Yolo-Mask model is adopted to realize the visual recognition and segmentation of composite component regions, after which the segmentation results are mapped to the depth map and further converted into the surface point cloud of the material. Secondly, on the basis of point cloud preprocessing and trajectory point extraction, the automatic planning of the robotic arm's scanning trajectory is achieved, which drives the robotic arm to perform precise motion and to synchronously collect spatial pose and ultrasonic testing data. Finally, 3D reconstruction is completed via a fusion algorithm, and 3D images of the material's internal structures are generated. Experimental verification shows that the proposed method achieves a Segm-mAP of 97.4%, a detection speed of 11.7 fps, and a 3D imaging error of less than 0.1 mm, thereby realizing fully automated detection throughout the entire process. This research provides an effective solution for the non-destructive testing of aircraft composite structures.
- Research Article
- 10.3390/diagnostics16071046
- Mar 30, 2026
- Diagnostics (Basel, Switzerland)
- Arturs Silovs + 14 more
Background: Parkinson's disease (PD) is characterized by progressive degeneration of dopaminergic neurons in the substantia nigra pars compacta, accompanied by neuromelanin loss. Neuromelanin-sensitive magnetic resonance imaging (NM-MRI) enables in vivo visualization of these changes; however, its diagnostic and clinical utility remains incompletely defined. This study evaluated the feasibility, reliability, and biological sensitivity of semi-automated NM-MRI-based substantia nigra volumetry in PD. Methods: In this prospective case-control study, 50 participants (25 PD patients and 25 healthy controls) underwent 3T NM-sensitive MRI using a high-resolution T1-weighted spin-echo sequence. Semi-automated segmentation of hyperintense substantia nigra regions was performed using Mango v3.5.1, with intracranial volume normalization derived from FreeSurfer v7.3. Four participants were excluded due to motion artifacts, yielding a final cohort of 46 subjects. Clinical assessment included the Unified Parkinson's Disease Rating Scale (UPDRS) Part III and Hoehn and Yahr (H&Y) staging. Group comparisons, receiver operating characteristic (ROC) analysis, and reliability testing using intraclass correlation coefficients (ICC) were performed. Results: Corrected substantia nigra volume was significantly reduced in PD patients compared with controls (18% reduction; p = 0.039, Mann-Whitney U test). Semi-automated measurements demonstrated excellent agreement with manual segmentation (ICC = 0.945). ROC analysis showed moderate discriminative performance for corrected volume (AUC = 0.700; sensitivity 68.4%, specificity 74.1%). No significant correlation was observed between corrected substantia nigra volume and UPDRS-III motor scores, while a trend toward lower SNc volume was observed with advancing H&Y stage. Conclusions: Semi-automated NM-MRI volumetry detects biologically meaningful substantia nigra volume loss in early-stage Parkinson's disease with high measurement reliability. However, diagnostic performance was moderate and insufficient for standalone clinical diagnosis or motor severity prediction. These findings support the role of NM-MRI as a complementary imaging marker within multimodal diagnostic and research frameworks rather than as an independent diagnostic tool.
- Research Article
- 10.1038/s41598-026-46525-7
- Mar 29, 2026
- Scientific reports
- Xinhai Chen + 2 more
High-voltage overhead transmission line insulator strings are persistently exposed to harsh environments such as heavy rain, lightning, strong ultraviolet radiation, and significant temperature variations, which can readily lead to the deterioration of their electrical and mechanical performance. This degradation may result in defects like flashover and physical damage, posing a severe threat to grid security. To achieve efficient and accurate identification of defects in insulator strings, this paper proposes a detection method that integrates RGB color space analysis with multi-scale feature compensation. Firstly, preliminary segmentation of the insulator region is performed based on RGB component thresholds, and coarse localization of the insulator string is accomplished by combining morphological operations with non-overlapping window texture analysis. Subsequently, the minimum enclosing horizontal rectangle of the target is generated, enabling fine localization of the insulator string and segmentation of the horizontal rectangular region based on the texture features within the bounding box. Finally, a multi-scale compensation detection network is constructed, which employs multi-level detection heads to differentially compensate for missing high- and low-frequency information, thereby enhancing defect recognition accuracy. Experiments demonstrate that the proposed method significantly outperforms eight mainstream object detection models in terms of mean Average Precision (mAP). Through rigorous experimental validation, at an Intersection over Union (IoU) threshold of 0.5, the mAP of our method shows an improvement of approximately 4.1% points compared to the best-performing baseline method.
- Research Article
- 10.1007/s00170-026-17884-2
- Mar 27, 2026
- The International Journal of Advanced Manufacturing Technology
- Do Young Jeoung + 7 more
Laser powder bed fusion (L-PBF) is highly sensitive to process instabilities, making reliable in-situ anomaly detection essential for quality assurance. In this study, we propose a synergistic framework that integrates image preprocessing, CAD-guided region segmentation, and lightweight convolutional neural networks (CNNs) for layer-wise anomaly detection using CCD images acquired during L-PBF. Process-specific image preprocessing techniques, including histogram equalization and contrast enhancement, are employed to improve defect visibility under low-contrast conditions. Furthermore, CAD-guided segmentation is used to explicitly separate part and powder regions, enabling region-specific classification of anomalies such as super-elevation and recoater streaking. Lightweight CNN models are then trained on segmented regions to achieve robust performance with reduced computational cost. Experimental results obtained from twelve datasets demonstrate that the proposed framework significantly improves detection accuracy compared to raw-image-based learning, with accuracy increasing from below 23% to over 80% in representative cases, while maintaining stable performance across different anomaly types. The results confirm that combining process-aware preprocessing with region-specific learning effectively enhances anomaly discrimination and supports practical, low-compute deployment for real-time monitoring in industrial L-PBF systems.
- Research Article
- 10.1186/s13048-026-02070-5
- Mar 25, 2026
- Journal of ovarian research
- Jie Shen + 7 more
High-grade serous ovarian cancer (HGSOC) is characterized by high incidence and mortality rates, yet effective detection methods remain limited. In the present study, deep learning models capable of identifying HGSOC while concurrently providing segmentation results of tumor regions and intratumoral solid components were developed and validated. This retrospective single-center diagnostic study included 395 patients (and their combined 1745 ultrasound images) with pathologically confirmed primary malignant ovarian tumors who underwent preoperative ultrasound examination between January 2018 and December 2022. The dataset was split into training (n = 237; 148 HGSOC, 89 Non-HGSOC), validation (n = 79; 49 HGSOC, 30 Non-HGSOC), and test (n = 79; 50 HGSOC, 29 Non-HGSOC) sets at a 6:2:2 ratio. Four models were constructed, all based on Residual Network-18 (ResNet-18) for feature extraction from grayscale and color Doppler ultrasound (US) images. Model A was built solely on image features, while Models B, C, and D further integrated additional features via a multilayer perceptron (MLP) as follows: Model B incorporated cancer antigen 125 (CA125); Model C combined CA125 and human epididymis protein 4 (HE4); and Model D integrated CA125, HE4, and age. All the models employed a dual-branch decoder for classification and segmentation to generate the corresponding outputs. The discriminative ability of the models was assessed using the area under the receiver operating characteristic curve (AUC), while classification performance was evaluated with the sensitivity and specificity. The Dice surface coefficient (DSC) was determined to evaluate segmentation performance. Among the 395 patients, 62.5% had HGSOC, while 37.5% were diagnosed with other pathological types. In the identification of HGSOC, the four models yielded AUC values of 0.72 (Model A), 0.81 (Model B), 0.80 (Model C), and 0.84 (Model D). Compared with Model A, which solely used image features, Model D resulted in the most significant increase in the AUC value (p = 0.03). The sensitivity across the models ranged from 0.66 to 0.86, with Model B yielding the highest value (0.86); the specificity values ranged from 0.63 to 0.93, with Model D yielding the highest value (0.93). The DSC values ranged from 0.70 to 0.86. Deep learning models can be used to distinguish HGSOC and concurrently generate real-time segmentation results for relevant regions.
- Research Article
- 10.1088/1361-6501/ae572d
- Mar 25, 2026
- Measurement Science and Technology
- Xiaoqi Cheng + 5 more
Abstract To address the challenges of navigation path planning in Unmanned Aerial Vehicle (UAV)-based pipeline inspection missions arising from factors including long pipeline distances and dynamically varying installation environments, this paper proposes a visual navigation method for UAVs based on pipeline self-contour reconstruction. First, a mirror-based Field-of-View (FoV) conversion device is utilized to perform perspective transformation of the forward-facing camera's navigation view. The body-eye transformation is then computed using the camera's structural parameters. Second, a Fully Convolutional Network (FCN) is employed for pipeline region segmentation and edge contour extraction in complex backgrounds. Third, leveraging the extracted pipeline contour features as input, pipeline self-contour reconstruction is realized via the Constant Diameter Pipeline (CDP) perspective projection model. Finally, UAV navigation control is accomplished by mapping virtual pipeline axes to real-world coordinates through the body-eye transformation. Experimental results from computer simulations and indoor/outdoor experiments validate the proposed method. It facilitates ready-to-deploy autonomous vision-guided navigation for pipeline inspection UAVs without the need for pre-planned trajectories, offering improved operational efficiency and flexibility in UAV-based pipeline inspection scenarios.
- Research Article
- 10.1038/s41598-026-44325-7
- Mar 25, 2026
- Scientific reports
- Shlok Nandurbarkar + 2 more
The rapid growth of born-digital PDF documents has amplified the demand for fast, precise tabular data extraction on an industrial scale. State-of-the-art deep-learning approaches have high accuracy, but at the resource expense of substantial computational complexity, data-hungry training process and black-box incomprehensibility, confining deployment in the real world. In this paper, we introduce SPARTAN (Structured Parsing and Relevant Table Analysis), an entirely open-source, heuristic-based pipeline, with high-fidelity table detection and extraction and no model training or GPU requirements. SPARTAN mixes lightweight OpenCV image-processing modules: column whitespace analysis, boundary and text-based region segmentation and line segment cell parsing, with a modular OCR layer and optional post-processing hooks for LLM-driven schema mapping. We evaluated SPARTAN on more than 20K pages of PCN-480 (Product Change Notification and Product Discontinuance Notification), scientific papers, certificates and datasheets and reported 0.94 precision, 0.91 recall and 0.93 F1-score, with 96.7% OCR character accuracy, processing a page in 2.8s CPU time on an average, and requiring 1.2GB peak RAM on the most demanding PDFs. Our model outperformed Tabula, Deepdoctection, TabbyPDF and EMbTTBF in accuracy and speed. It must be noted that this comparison was conducted against standard inference configurations for the learning-based baselines; a performance evaluation against highly-optimized, edge-device deployments is a consideration for future work. Its rule transparency effortlessly copes with borderless, nested and merged-cell layouts that easily outsmart classical heuristics, without incurring the resource cost of end-to-end neural pipelines. SPARTAN's CLI-governed, swap-in-swap-out architecture encourages domain tuning, edge deployment and cloud-scalable REST service wrapping, making it a practical bridge between brittle rule systems and heavyweight AI for document-understanding pipelines. The work proves that well-crafted modernized heuristics, combined with high-quality OCR, can match or even outperform state-of-the-art deep learning while remaining within reach of small and medium enterprises, thus re-opening a critical gate to cost-efficient, explainable PDF table extraction.
- Research Article
- 10.3390/s26072027
- Mar 24, 2026
- Sensors (Basel, Switzerland)
- Zichan Tan + 5 more
Motor vehicle encroachment into non-motorized lanes is a common but hard-to-verify violation in urban intersections, especially when monitored from unmanned aerial vehicles (UAVs) or high-mounted overhead views. Existing rule-based solutions built on horizontal bounding boxes and center-point/line-crossing criteria are sensitive to perspective distortion, occlusion, and frame-to-frame jitter, resulting in unstable decisions and low evidential value. This paper presents a cascaded UAV-view system that closes the loop from perception to evidence output through detection-segmentation-recognition-decision. First, we adopt a two-stage detection cascade: a lightweight vehicle detector localizes vehicles using axis-aligned bounding boxes, and a dedicated YOLOv5n-based oriented bounding box (OBB) license plate detector, constructed via architecture grafting and weight transfer, is then applied within each vehicle region of interest (ROI) to localize rotated license plates under large pose variation and small-target conditions. Second, a U-Net lane region segmentation module provides pixel-level spatial constraints to define an enforceable lane occupancy region. Third, a perspective rectification step is integrated with the PP-OCRv4 optical character recognition (OCR) framework to improve license plate recognition reliability for tilted plates. Finally, an area ratio criterion and an N-frame temporal counter are used to suppress transient misdetections and stabilize alarms. On a representative 100-sample controlled encroachment benchmark, the proposed system improves detection accuracy from 67.0% to 92.0% and reduces the false positive rate from 32.35% to 5.88% compared with a baseline horizontal bounding box (HBB)-based rule. The system outputs both violation alarms and license plate evidence, supporting practical deployment for multi-view traffic governance.
- Research Article
- 10.3390/s26072029
- Mar 24, 2026
- Sensors (Basel, Switzerland)
- Linghao Dai + 4 more
Safety monitoring in container hoisting operations within rail-road intermodal logistics parks is a critical task in industrial safety management. Such scenarios are characterized by complex environments, large variations in target scales, deformable object shapes, and frequent occlusions, which pose significant challenges to visual perception systems. Conventional single-task models suffer from inherent limitations in handling low recall rates for distant small targets and insufficient adaptability to geometric deformations, making them inadequate for high-precision, real-time safety warning applications. To address these challenges, this study proposes a unified visual analysis framework that integrates semantic segmentation and object detection to enhance the recognition performance of small and deformable targets in complex operational environments, enabling real-time perception and safety warning of key objects and hazardous regions within container yards. Specifically, we introduce FSD-YOLO, a fusion-based architecture composed of the following key components. First, a SegFormer-based semantic segmentation module is employed to achieve pixel-level delineation of different operational regions. Second, an improved object detection network is developed based on the YOLOv8n architecture, incorporating: (1) the integration of C2f modules in the shallow layers of the backbone to enhance high-resolution feature extraction; (2) the embedding of C2fDCN modules within the detection head to improve modeling capability for deformable objects via deformable convolution; (3) the adoption of CARAFE upsampling operators to optimize multi-scale feature fusion; and (4) a dynamic loss-weighting strategy for small objects, where loss weights are adaptively adjusted according to target area to increase training emphasis on small-scale targets. Finally, a decision-level fusion strategy is applied to combine segmentation and detection outputs, enabling real-time safety judgment based on semantic rules. Experimental results on a self-constructed container yard dataset demonstrate that the proposed detection model achieves an mAP50-95 of 0.6433 and an mAP50 of 0.9565, significantly outperforming the baseline YOLOv8n model (mAP50-95: 0.5394, mAP50: 0.8435), thereby validating the effectiveness of the proposed framework.