Published in last 50 years
Articles published on Multi-scale Features
- New
- Research Article
- 10.1080/20964471.2025.2575602
- Nov 8, 2025
- Big Earth Data
- Hangchuan Xie + 5 more
ABSTRACT In recent years, object detection in remote sensing imagery has faced significant challenges, including scale variation, large inter-class differences, complex backgrounds, and confusion between objects and their surroundings. To tackle these issues, we propose ECHF-YOLO, a novel optical remote sensing detection framework built upon the YOLO architecture. ECHF-YOLO incorporates efficient convolution and high-low frequency feature fusion to enhance detection performance. Specifically, we introduce Dynamic Convolution, which enriches feature representation without increasing computational cost. Furthermore, we design a High-Low Frequency Adaptive Fusion Neck (HFAF-Neck) that improves the quality of upsampled features and restores detailed information through an enhanced Adaptive Fusion Dynamic Sample (AFDS) module and lightweight Group Shuffle Convolution (GSConv), balancing accuracy and efficiency. To better integrate shallow texture features with deep semantic cues, we propose the Context and Multi-Scale Convolution Guided Frequency Fusion (CMGFF) module for multi-scale adaptive feature fusion. Finally, a Dual-Task Branch Alignment Detector (DTBAD) is introduced to strengthen interaction between classification and localization tasks via task alignment, boosting overall detection accuracy. Experimental results on the DIOR, NWPUVHR-10, and RSOD datasets demonstrate the superiority of ECHF-YOLO, respectively, surpassing some advanced detectors.
- New
- Research Article
- 10.1080/01431161.2025.2583601
- Nov 8, 2025
- International Journal of Remote Sensing
- Haoran Gong + 4 more
ABSTRACT Accurate 3D building reconstruction is crucial for advancing urban digital twinning, city planning, and sustainable development. As a key architectural component, rooftops facilitate urban energy management and inform urban morphological analysis. Consequently, achieving precise and scalable rooftop reconstruction has emerged as a key research focus in recent years. Point clouds, with their ability to preserve detailed geometric structures, are well suited for this task. However, existing methods predominantly target synthetic rooftop datasets, which lack architectural diversity and often require high-quality point clouds as input. These limitations hinder their applicability to large-scale, real-world urban environments characterized by varied rooftop designs and noisy or sparse data. To address these challenges, we propose a novel end-to-end framework for rooftop wireframe reconstruction from airborne laser scanning (ALS) point clouds. Our approach introduces a multi-scale local feature descriptor optimized for rooftops to enhance per-point geometric feature extraction. Then, a hypergraph-based attention fusion module integrates these features. After comprehensive feature learning by a robust backbone, initial corner detection is followed by a Transformer- and EdgeConv-enhanced edge classification mechanism that models topological relationships through long-range dependencies. Experiments on the large-scale real-world Building3D dataset demonstrate significant improvements over the baseline, with corner accuracy improved by 35% on the Entry-level subset and 41% on the Tallinn subset. Qualitative comparisons further reveal superior wireframe fidelity, underscoring the method’s potential to support digital twinning, urban management, and economic development in smart city initiatives.
- New
- Research Article
- 10.1038/s41598-025-25092-3
- Nov 7, 2025
- Scientific reports
- Lincen Jiang + 2 more
Accurate segmentation of tumor PET images has a significance in disease monitoring and clinical treatment. However, the task of accurately segmenting PET images remains challenging due to insufficient access to spatial features of PET images. Therefore, we propose a multiscale attention-guided feature mechanism for 3D tumor segmentation (MA3DSeg) to achieve accurate segmentation of tumor PET images. Specifically, we first utilize a 3D-DX block to construct a lightweight encoder to minimise the redundant information generated across channel features and reduce the model complexity. Additionally, we employ multiscale attention-guided feature enhancement (MAFE) module for 3D networks to fuse channel features and spatial features to fully acquire spatial feature analysis of tumors. Furthermore, we propose explicit and implicit difference information in interaction boundary semantic (IBS) module to enhance the representation of segmentation edges. We conduct rigorous experiments on three publicly available datasets ECPC-IDS, Hecktor 2022 and AutoPET, which demonstrated superior performance than other competitive networks. MA3DSeg significantly improves Dice by 0.5% on Hecktor 2022 and RVD by 2.1% on ECPC-IDS compared with 3D UX-Net. The experimental results show that MA3DSeg achieves excellent tumor segmentation performance and generalization ability.
- New
- Research Article
- 10.1038/s41598-025-26802-7
- Nov 7, 2025
- Scientific reports
- Zhangxiong Zhu + 4 more
Karst reservoir Computed Tomography (CT) images exhibit blurred boundaries, scale variations, and complex structures. Existing 3D U-Net-based segmentation methods are inadequate in both detail recognition and overall structural representation. Therefore, this paper proposes an improved 3D U-Net architecture to adapt to the multi-scale and low-contrast characteristics of karst reservoirs. This paper introduces a multi-scale input path at the encoder end, extracting volumetric features at different resolutions in parallel to capture both fine-grained holes and large-scale channels. A spatial attention module is embedded in the skip connections to weight the encoded features to highlight boundaries and key regions. Multi-scale features are fused during the decoding phase to gradually reconstruct the three-dimensional space. Furthermore, the Dice loss is combined with the gradient-based boundary-aware loss during training. The latter enhances boundary sensitivity by calculating the 3D gradient difference between the predicted image and the label image. Experimental results show that the improved complete model achieves an 87.8% Dice coefficient and a 1.9-pixel boundary error in karst reservoir CT image segmentation, improving both regional overlap and boundary accuracy. This method effectively identifies karst structures at different scales, providing reliable data support for complex reservoir modeling and analysis.
- New
- Research Article
- 10.1088/1361-6501/ae1858
- Nov 7, 2025
- Measurement Science and Technology
- Zhiqin Zhang + 5 more
Abstract Insulators are critical components in transmission lines. Common defects, such as structural loss of the insulator caused by spontaneous rupture, breakage, and fouling can lead to short circuits and tripping faults, posing serious threats to power grid stability and the safety of the power supply. However, in practical applications, insulator defect detection faces several challenges, including small target sizes, insufficient representation of multiscale features, complex backgrounds, and imbalanced datasets with a limited number of defective samples. Traditional detection methods often struggle with missed detections of small targets and lack robustness in scenarios with large-scale variations and complex environments. To address these issues, this paper proposes an enhanced detection model based on YOLOv8s. The model introduces an Iterative Attentional Feature Fusion (iAFF) module to optimize multiscale feature representation and incorporates a Generalized Dynamic Feature Pyramid Network (GDFPN) to improve feature retention for small target detection, thereby enhancing robustness in complex backgrounds. Additionally, to mitigate the problem of limited defective sample data, the Stable Diffusion generative model is utilized to augment the dataset, effectively improving detection performance in small-sample scenarios. Experimental results demonstrate that the proposed method significantly outperforms the original YOLOv8s model in terms of recall, accuracy, and precision on the insulator defect dataset. The model exhibits strong detection capabilities and generalization performance, making it well-suited for real-world challenges such as small targets, multiscale variation, and complex backgrounds.
- New
- Research Article
- 10.1088/1361-6501/ae1854
- Nov 6, 2025
- Measurement Science and Technology
- Jian Tang + 4 more
Abstract Steel surface defect detection is a key link to ensure the quality of industrial products and production safety. However, the traditional detection method has some problems, such as low detection efficiency, high false detection rate and high missed detection rate, so it is difficult to adapt to the fine detection needs of modern industry. Therefore, we propose an improved YOLOv11 object detection model named CPFS-YOLO. Firstly, to address the deficiency of traditional detection methods in multi-scale feature extraction, a CSP-partial multi-scale feature aggregation module (CSP_PMSFA) is proposed, which effectively enhances the expression ability of defect features. Secondly, to improve the model’s perception ability for defects of different scales, feature pyramid shared Conv is introduced, enabling the model to achieve more efficient feature extraction without increasing the computational cost. Finally, to optimize the detection performance of minor defects, a small object enhancement pyramid is designed. By fusing global and local information, it enhances the recognition ability of minor defects on the surface of steel materials. Experiments on the NEU-DET dataset show that the mAP@0.5 and mAP@0.5:0.95 of CPFS-YOLO reach 77.7% and 44.6% respectively, which are 3.6% and 3.2% higher than the baseline model. Compared with existing methods, CPFS-YOLO achieves superior detection accuracy while maintaining a lower computational cost, effectively reducing the problems of false detection and missed detection.
- New
- Research Article
- 10.1038/s41598-025-22834-1
- Nov 6, 2025
- Scientific reports
- Tao Zhou + 5 more
Multimodal medical image fusion plays an important role in clinical applications. However, multimodal medical image fusion methods ignore the feature dependence among modals, and the feature fusion ability with different granularity is not strong. A Long-Range Correlation-Guided Dual-Encoder Fusion Network for Medical Images is proposed in this paper. The main innovations of this paper are as follows: Firstly, A Cross-dimension Multi-scale Feature Extraction Module (CMFEM) is designed in the encoder, by extracting multi-scale features and aggregating coarse-to-fine features, the model realizes fine-grained feature enhancement in different modalities. Secondly, a Long-range Correlation Fusion Module (LCFM) is designed, by calculating the long-range correlation coefficient between local features and global features, the same granularity features are fused by the long-range correlation fusion module. long-range dependencies between modalities are captured by the model, and different granularity features are aggregated. Finally, this paper is validated on clinical multimodal lung medical image dataset and brain medical data dataset. On the lung medical image dataset, IE, AG, [Formula: see text], and EI metrics are improved by 4.53%, 4.10%, 6.19%, and 6.62% respectively. On the brain medical image dataset, SF, VIF, and [Formula: see text] metrics are improved by 3.88%, 15.71%, and 7.99% respectively. This model realizes better fusion performance, which plays an important role in the fusion of multimodal medical images.
- New
- Research Article
- 10.1007/s40534-025-00415-2
- Nov 6, 2025
- Railway Engineering Science
- Jindong Wang + 4 more
Abstract Wheelsets are crucial components of subway locomotives, and defects on their tread surfaces pose significant safety risks. This study presents an enhanced defect detection algorithm based on the YOLOv5 model, specifically designed to identify tread defects in subway wheelsets to meet the demands of intelligent maintenance. To improve the detection of small targets, we incorporate a multi-head self-attention module, which enhances the model’s ability to capture long-range dependencies within global feature maps. Additionally, a weighted bidirectional feature pyramid network is adopted to achieve balanced multi-scale feature fusion, enabling efficient cross-scale integration. To overcome limited labeled data and annotation inaccuracies, we propose a novel loss function (W-MPDIoU) to accelerate model convergence. Experimental results demonstrate that our enhanced model achieves 99.1% average precision—a 4.29% improvement over the original YOLOv5. With reduced parameters and a detection speed of 15 ms per image, the proposed solution enables real-time tread defect detection in subway systems, significantly improving safety and operational efficiency.
- New
- Research Article
- 10.1038/s41598-025-22649-0
- Nov 6, 2025
- Scientific reports
- Benyamin Mirab Golkhatmi + 2 more
Medical image segmentation is crucial in accurately diagnosing diseases and assisting physicians in examining relevant areas. Therefore, there is a pressing need for an artificial intelligence-based model that can facilitate the diagnostic process and reduce errors. Existing networks often have high parameters, high gigaflops, and low accuracy. This research addresses this gap by proposing a transformer-based architecture. We developed a Swin transformer as an encoder to improve segmentation accuracy in medical images by leveraging deep feature extraction. This model can extract key image features more accurately by employing sliding windows and an attention mechanism. The overarching goal of this research is to design an optimized architecture for medical image segmentation that maintains high accuracy while reducing the number of network parameters and minimizing computational costs. In the decoder section, we designed the dynamic feature fusion block (DFFB) to enhance the extracted features, enabling the extraction of multi-scale features. This capability enables the model to analyze the structural information of medical images at various levels, resulting in improved performance in segmenting complex regions. We also employed the dynamic attention enhancement block to further enhance the features extracted from the DFFB output. This block utilizes spatial and channel attention mechanisms to emphasize key areas in the images, thereby enhancing the model's overall accuracy. The proposed model achieved segmentation performance across three medical datasets, obtaining a mean intersection over union (mIoU) of 0.9125 and a Dice score of 0.9542 on GlaS, an mIoU of 0.9174 and a Dice score of 0.9569 on PH2, and an mIoU of 0.9085 and a Dice score of 0.9521 on Kvasir-SEG. The experiments illustrate that the proposed model outperforms previous methods, demonstrating its potential as an effective tool in medical image segmentation.
- New
- Research Article
- 10.3390/sym17111886
- Nov 6, 2025
- Symmetry
- Bing Li + 2 more
Black smoke emitted from diesel vehicles contains substantial amounts of hazardous substances. With the increasing annual levels of such emissions, there is growing concern over their detrimental effects on both the environment and human health. Therefore, it is imperative to strengthen the supervision and control of black smoke emissions. An effective approach is to analyze the smoke emission status of vehicles. Conventional object detection models often exhibit limitations in detecting black smoke, including challenges related to multi-scale target sizes, complex backgrounds, and insufficient localization accuracy. To address these issues, this study proposes a multi-dimensional detection algorithm. First, a multi-scale feature extraction method was introduced by replacing the conventional C2F module with a mechanism that employs parallel convolutional kernels of varying sizes. This design enables the extraction of features at different receptive fields, significantly improving the capability to capture black smoke patterns. To further enhance the network’s performance, a four-layer adaptive feature fusion detection head was proposed. This component dynamically adjusts the fusion weights assigned to each feature layer, thereby leveraging the unique advantages of different hierarchical representations. Additionally, to improve localization accuracy affected by the highly irregular shapes of black smoke edges, the Inner-IoU loss function was incorporated. This loss effectively alleviates the oversensitivity of CIoU to bounding box regression near image boundaries. Experiments conducted on a custom dataset, named Smoke-X, demonstrated that the proposed algorithm achieves a 4.8% increase in precision, a 5.9% improvement in recall, and a 5.6% gain in mAP50, compared to baseline methods. These improvements indicate that the model exhibits stronger adaptability to complex environments, suggesting considerable practical value for real-world applications.
- New
- Research Article
- 10.1038/s41598-025-22862-x
- Nov 6, 2025
- Scientific reports
- Zhaoyan Xie + 4 more
Real-time high-precision image segmentation on mobile and edge devices remains challenging due to the limited ability of traditional convolutional networks to model long-range dependencies and their high computational cost at high resolutions. We propose PUNet, a lightweight parallel U-Net variant that integrates depthwise separable convolutions (DSConv) with a structured state-space Mamba module in a dual-path encoder. The core component, the parallel structured state-space encoder, employs two branches to efficiently capture local spatial features (via DSConv) and model global semantic dependencies (via the visual Mamba layer). At the same time, a squeeze-and-excitation skip connection adaptively fuses multi-scale features. With only 0.26 M parameters and linear computational complexity, PUNet enables real-time inference on resource-constrained platforms. Experiments on the CamVid and CRACK500 datasets demonstrate superior performance, achieving validation Dice scores of 0.9208 and 0.7902, and mean Intersection-over-Union of 0.8643 and 0.6612, respectively, significantly outperforming other lightweight models.
- New
- Research Article
- 10.3390/app152111824
- Nov 6, 2025
- Applied Sciences
- Petra Radočaj + 2 more
Pneumonia remains a major global health concern, particularly among pediatric populations in low-resource settings where radiological expertise is limited. This study investigates the enhancement of deep convolutional neural networks (CNNs) for automated pneumonia diagnosis from chest X-ray images through the integration of a novel module combining Inception blocks, Mish activation, and Batch Normalization (IncMB). Four state-of-the-art transfer learning models—InceptionV3, InceptionResNetV2, MobileNetV2, and DenseNet201—were evaluated in their base form and with the proposed IncMB extension. Comparative analysis based on standardized classification metrics reveals consistent performance improvements across all models with the addition of the IncMB module. The most notable improvement was observed in InceptionResNetV2, where the IncMB-enhanced model achieved the highest accuracy of 0.9812, F1-score of 0.9761, precision of 0.9781, recall of 0.9742, and strong specificity of 0.9590. Other models also demonstrated similar trends, confirming that the IncMB module contributes to better generalization and discriminative capability. These enhancements were achieved while reducing the total number of parameters, indicating improved computational efficiency. In conclusion, the integration of IncMB significantly boosts the performance of CNN-based pneumonia classifiers, offering a promising direction for the development of lightweight, high-performing diagnostic tools suitable for real-world clinical application, particularly in underserved healthcare environments.
- New
- Research Article
- 10.3389/fpls.2025.1698427
- Nov 5, 2025
- Frontiers in Plant Science
- Zhuoran Xing + 7 more
Introduction The cigar leaves moisture content (CLMC) is a critical parameter for controlling curing barn conditions. Along with the continuous advancement of deep learning (DL) technologies, convolutional neural networks (CNN) have provided a way of thinking for the non-destructive estimation of CLMC during the air-curing process. Nevertheless, relying merely on single-perspective imaging makes it difficult to comprehensively capture the complementary morphological features of the front and back sides of cigar leaves during the air-curing process. Methods This study constructed a dual-view image dataset covering the air-curing process, and proposes a regression framework named CADFFNet (channel attention weight-based dual-branch feature fusion network) for the non-destructive estimation of CLMC during the curing process based on dual-view RGB images. Firstly, the model utilizes two independent and parallel ResNet as its backbone structure to capture the heterogeneous features of dual-view images. Secondly, the Dual Efficient Channel Attention (DECA) module is introduced to dynamically adjust the channel attention weights of the features, thereby facilitating interaction between the two branches. Lastly, a Multi-scale convolutional feature fusion (MSCFF) module is designed for the deep fusion of features from the front and back images to aggregate multi-scale features for robust regression. Results On five-fold cross-validation, CADFFNet attains R2 of 0.974±0.007 and mean absolute error (MAE) of 3.80±0.37%. On an independent cross-region, cross-variety testing set, it maintains strong generalization (R2=0.899, MAE=5.82%), compared with the classic CNN models ResNet18, GoogLeNet, VGG19Net, DenseNet121, and MobileNetV2, its R2 value has increased by 0.047, 0.041, 0.055, 0.098, and 0.090 respectively. Discussion Generally, the proposed CADFFNet offers an efficient and convenient method for non-destructive detection of CLMC, providing a theoretical basis for automating the air-curing process. It also provides a new perspective for moisture content prediction during the drying process of other crops, such as tea, asparagus, and mushrooms.
- New
- Research Article
- 10.1364/oe.573079
- Nov 5, 2025
- Optics Express
- Jiaxin Wu + 7 more
Single-pixel imaging technology has demonstrated unique advantages in weak light detection and special spectral sensing fields. However, its reconstruction performance is limited at low measurement rates, which remains a key challenge. Current mainstream reconstruction processes face issues such as the decoupling and underutilization of measurement information at low rates, as well as the loss of sample diversity caused by deterministic mappings, thereby restricting the ability to capture fine details in complex scenes. To address these challenges, this study proposes a dual-stage reconstruction framework with multi-feature prior fusion, WFD-SPI, which achieves deep analysis of measurement information and feature reconstruction optimization through innovative architectural design. Specifically, in the conditional generative pre-training phase, the Moore-Penrose generalized inverse operator is introduced to construct the pseudoinverse space of the measurement matrix, establishing a bidirectional mapping relationship between measurement information and image information, effectively extracting latent feature representations. In the inverse diffusion reconstruction process, the Fourier frequency domain features and wavelet multi-scale space features are non-linearly fused innovatively, and a differentiable operator constraint is applied to generate the joint probability distribution of the generated image and real data, significantly enhancing the representation of high-frequency textures and edge information. Experimental results show that our method outperforms existing SPI techniques in terms of recovery quality at low measurement rates. Subsequent ablation further validates the effectiveness of our approach.
- New
- Research Article
- 10.5194/isprs-archives-xlviii-1-w5-2025-153-2025
- Nov 5, 2025
- The International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences
- Leheng Xu + 4 more
Abstract. To address perception challenges for autonomous vehicles and drones in complex urban environments, this paper proposes a novel Bird’s Eye View (BEV) fusion method MSA-BEVFusion to integrate LiDAR and RGB cameras via multi-scale attention mechanisms. Unlike existing methods that tightly couple image and LiDAR features or BEV-based approaches relying on simplistic convolutional fusion, our method first integrates multi-scale image features through the MFPN module and employs multi-scale attention enhancement to achieve deep fusion between camera and LiDAR features before feeding them into the detection head, ultimately delivering superior detection performance. Experiments on the nuScenes dataset demonstrate excellent performance, achieving 0.2% NDS and 0.4% mAP improvements over BEVFusion-MIT. The method shows robust 3D detection in dark, rainy, and snowy conditions, with enhanced accuracy for small or occluded objects. Attention heatmaps reveal effective cross-modal alignment, synergizing LiDAR’s geometric precision with camera texture details. This work bridges modality gaps through bidirectional interaction, advancing robust environmental perception while mitigating spatial discordance in unified BEV representations.
- New
- Research Article
- 10.1364/oe.572280
- Nov 5, 2025
- Optics Express
- Yinhua Wu + 5 more
As an important method for exoplanet detection, the key to the radial velocity (RV) method lies in the high-precision measurement of stellar RV. Coherent-dispersion spectrometer (CODES), which obtains the RV by analyzing the Doppler phase shift of stellar absorption line interference spectrum, shows significant potential in the application of the RV method due to its high energy utilization efficiency and excellent environmental stability. However, the background white light in the stellar absorption spectrum makes phase noise to the CODES, which leads to a significant decrease and fluctuation in the RV inversion precision. To remove the phase noise caused by background white light, the Dual-domain background white light prediction network (DDBWP-Net) is proposed by integrating spatial and frequency domain processing. In DDBWP-Net, firstly, the main energy of the background white light and the absorption line is separated through frequency-domain processing, and then the global high-frequency and low-frequency features of the background white light interference spectrum are extracted. Secondly, the global high-frequency and low-frequency features are fused deeply by introducing the channel attention mechanism, enhancing the integrity of the feature representation. Thirdly, the multi-scale local detail features of the background white light interference spectrum are extracted by utilizing the spatial convolution based on the double-layer residual structure. Finally, the prediction result of the background white light interference spectrum is reconstructed. The experimental results show that the RV inversion error is mainly distributed in the range of 0-0.15 m/s, with the mean error of 0.08 m/s and the root mean square error of 0.13 m/s, after removing the background white light predicted by DDBWP-Net from the original absorption line interference spectrum. Moreover, with different absorption lines and different fixed optical path differences, the error distribution shows good uniformity. This indicates that DDBWP-Net can predict background white light accurately, and has strong stability and robustness, providing solid technical support for CODES to achieve high-precision exoplanet detection.
- New
- Research Article
- 10.1371/journal.pone.0335826
- Nov 5, 2025
- PLOS One
- Sebastian Medina + 3 more
Classification methods based on deep learning require selecting between fully-supervised or weakly-supervised approaches, each presenting limitations in uncertainty quantification and interpretability. A framework unifying both supervision modes while maintaining quantifiable interpretation metrics remains unexplored. We introduce WiSDoM (Weakly-Supervised Density Matrices), which uses kernel matrices to model probability distributions of input data and their labels. The framework integrates: (1) differentiable kernel density matrices enabling stochastic gradient descent optimization, (2) local-global attention mechanisms for multi-scale feature weighting, (3) data-driven prototype generation through kernel space sampling, and (4) ordinal regression through density matrix operations. WiSDoM was validated through supervised patch classification ( = 0.896) and weakly-supervised whole-slide classification ( = 0.930) on histopathology images. WiSDoM generates three quantifiable outputs: posterior probability distributions, variance-based uncertainty maps, and phenotype prototypes. Through validation in a Gleason grading task at a patch and whole-slide level using histopathology images, WiSDoM demonstrated consistent performance across supervision modes ( > 0.89) and prototype interpretability (0.88 expert agreement). These results show that kernel density matrices can serve as a foundation for classification models requiring both prediction interpretability and uncertainty quantification across supervision modes.
- New
- Research Article
- 10.1007/s00330-025-12097-9
- Nov 5, 2025
- European radiology
- Chaowei Ma + 7 more
This study presents a novel deep learning-machine learning fusion network for quantitative and interpretable assessment of chest X-ray positioning, aiming to analyze critical factors in patient positioning layout. In this retrospective study, we analyzed 3300 chest radiographs from a Chinese medical institution, collected between March 2021-December 2022. The dataset was partitioned into the XJ_chest_21 subset for training automated segmentation model and the XJ_chest_22 subset to validate three classification models: Random Forest Fusion Network (RFFN), Threshold Classification (TC), and Multivariate Logistic Regression (MLR). After automatically measuring five positioning indicators in the images, the data were input into the models to assess positioning quality. We compared the performance metrics of the three classification models, including AUC, accuracy, sensitivity, and specificity. SHAP (Shapley Additive Explanations) was utilized to interpret feature importance in the decision-making process of the RFFN model. We evaluated measurement consistency between the Automated Measurement Model (AMM) and radiologists. U-net++ demonstrated significantly superior performance compared to U-net in multi-target segmentation accuracy (mean Dice: 0.926 vs. 0.812). The five positioning metrics showed excellent agreement between AMM and reference standards (r = 0.93). ROC analysis indicated that RFFN performed significantly better in overall image quality classification (AUC, 0.982; 95% CI: 0.963, 0.993) compared to both TC (AUC, 0.959; 95% CI: 0.923, 0.995) and MLR (AUC, 0.953; 95% CI: 0.933, 0.974). Our study introduces a novel segmentation-based random forest fusion network that achieves accurate image positioning classification and identifies critical operational factors. Furthermore, the clinical interpretability of the fusion model was enhanced through the application of the SHAP method. Question How can AI-driven interpretable methods be utilized to assess patient positioning in chest radiography and enhance radiographers' accuracy? Findings The Random Forest Fusion Network (RFFN) outperformed Threshold Classification (TC) and Multivariate Logistic Regression (MLR) in positioning classification (AUC = 0.98). Clinical relevance An integrated framework that combines deep learning and machine learning achieves accurate image positioning classification, identifies critical operational factors, enables expert-level image quality assessment, and delivers automated feedback to radiographers.
- New
- Research Article
- 10.1117/1.jei.34.6.063003
- Nov 5, 2025
- Journal of Electronic Imaging
- Caiwei Fang + 2 more
Multiscale feature enhancement network based on RT-DETR for object detection in UAV images
- New
- Research Article
- 10.5194/isprs-annals-x-1-w2-2025-239-2025
- Nov 5, 2025
- ISPRS Annals of the Photogrammetry, Remote Sensing and Spatial Information Sciences
- Anhao Yang + 5 more
Abstract. In the context of agricultural modernization, precise 3D organ segmentation has become indispensable for automated extraction of phenotypic traits. In particular, the precise delineation of stem and leaf structures from 3D point clouds is critical for monitoring plant growth and supporting high-throughput breeding programs. However, the intricate structure of crops and the blurred boundaries between stems and leaves present significant challenges, leading to the poor segmentation performance. To tackle these problems, we propose a Semantic Embedding-Guided Graph Self-Attention Network for stem-leaf separation in 3D point clouds, to tackle weak feature representation and low inter-class separability in complex plant structures. During the encoding stage, a multi-scale feature extraction module captures fine-grained local geometries, while a feature fusion module integrating graph convolution and self-attention facilitates deep fusion of local and global semantic information. In the decoding stage, hierarchical upsampling combined with multi-level feature fusion reconstructs high-resolution representations to achieve fine-grained segmentation. Furthermore, we introduce a joint loss function that integrates inter-class discriminative loss with cross-entropy, aiming to optimize intra-class uniformity and reinforce class boundary delineation. Validation experiments on the Plant-3D dataset demonstrate that our methodology attains superior performance, with mean precision, recall, and IoU achieving 96.47%, 96.39%, and 93.50%, respectively. The proposed approach demonstrates high robustness and generalizability across diverse plant species and growth stages, providing an effective solution for high-throughput plant phenotyping.