Published in last 50 years
Articles published on Feature Pyramid
- New
- Research Article
- 10.1088/1361-6501/ae1aa2
- Nov 3, 2025
- Measurement Science and Technology
- Zhenlun Chen + 2 more
Abstract In complex orchard environments, the rapid and accurate detection of green tangerines is crucial for intelligent thinning and harvesting. However, factors such as branch and leaf occlusion, lighting variation, low contrast, and multi-scale targets severely affect detection accuracy. Additionally, existing deep learning methods have high computational overhead, which limits their application on edge devices, while lightweight methods struggle to balance both speed and accuracy. To address these challenges, this paper proposes a lightweight green tangerine detection model based on YOLOv11n, called SALC-Net. First, SPDConv and the innovative CSP-OmniKernel module are introduced, and an SCOK-FP feature pyramid is designed to facilitate efficient fusion of shallow and deep features, thereby improving small target detection accuracy. Second, an Adaptive Task Interaction Detection Head (ADTIH) is designed to enhance fine-grained feature recognition by jointly learning classification and localization features, while effectively handling multi-scale targets. Then, a pruning algorithm is applied to remove redundant parameters, meeting the deployment requirements for edge devices. Finally, channel distillation is employed to further improve accuracy. Experiments are conducted using the self-built GC-Dataset, which includes a variety of environmental conditions and shooting angles, facing challenges such as dense small targets (with small targets accounting for up to 97.1%), lighting changes, branch and leaf occlusion, and high target-background similarity. Experimental results show that SALC-Net improves the F1 score, mIoU, and AP@50 by 5.1%, 4.3%, and 5.0%, respectively, while the model parameters and weights are only 57.7% and 60.0% of those of YOLOv11n. Furthermore, on devices such as NX, i5, and i7, detection speeds of 31.2, 183.2, and 223.5 frames per second are achieved, meeting real-time requirements. This paper significantly enhances both accuracy and speed while reducing model complexity, demonstrating strong potential for deployment on edge devices.
- New
- Research Article
- 10.1088/2631-8695/ae14b6
- Nov 3, 2025
- Engineering Research Express
- Jianguang Zhao + 2 more
Abstract Aiming at the phenomena of missing detection and false detection caused by multiple instances of small and medium-sized targets, occlusion between targets and rotation deviation of target positions, an improved YOLO11(You Only Look Once 11) based small target detection algorithm for UAV images is proposed. Firstly, the C3DC module is proposed by DynamicConv and C3k2 to enhance the expression ability of different features. Secondly, C2PDA is introduced into the neck by combining Dynamic Position Bias (DPB) module to reduce the sensitivity of the model to the target position deviation. Finally, SCO module is proposed to improve the feature pyramid structure to improve the detection ability of small targets. Through the experimental verification on the VisDrone2019 dataset, the experimental results show that compared with the original YOLO11 model, the mAP50 of the improved model is increased by 3.2% to 42.2%, and the MAP50:95 is increased by 2.2% to 25.6%.It achieves 33.7% mAP50 on the UAVDT dataset, which is also better than other SOTA models.Compared with the existing YOLO model and its improved version, the proposed model maintains high detection accuracy in complex scenarios. The research results show that the algorithm has significantly improved the detection ability of UAV images in complex scenes, and its comprehensive performance comprehensively exceeds that of YOLO11 model.
- New
- Research Article
- 10.3390/agriculture15212290
- Nov 3, 2025
- Agriculture
- Zhedong Xie + 7 more
The pollination of kiwiberry flowers is closely related to fruit growth, development, and yield. Rapid and precise identification of flowers under natural field conditions plays a key role in enhancing pollination efficiency and improving overall fruit quality. Flowers and buds are densely distributed, varying in size, and exhibiting similar colors. Complex backgrounds, lighting variations, and occlusion further challenge detection. To address these issues, the YOLO11s-RFBS model was proposed. The P5 detection head was replaced with P2 to improve the detection of densely distributed small flowers and buds. RFAConv was incorporated into the backbone to strengthen feature discrimination across multiple receptive field scales and to mitigate issues caused by parameter sharing. The C3k2-Faster module was designed to reduce redundant computation and improve feature extraction efficiency. A weighted bidirectional feature pyramid slim neck network was constructed with a compact architecture to achieve superior multi-scale feature fusion with minimal parameter usage. Experimental evaluations indicated that YOLO11s-RFBS reached a mAP@0.5 of 91.7%, outperforming YOLO11s by 2.7%, while simultaneously reducing the parameter count and model footprint by 33.3% and 31.8%, respectively. Compared with other mainstream models, it demonstrated superior comprehensive performance. Its detection speed exceeded 21 FPS in deployment, satisfying real-time requirements. In conclusion, YOLO11s-RFBS enables accurate and efficient detection of kiwiberry flowers and buds, supporting intelligent pollination robots.
- New
- Research Article
- 10.3390/s25216703
- Nov 2, 2025
- Sensors
- Quanxiang Wang + 2 more
In autonomous driving, detecting small and occluded objects remains a substantial challenge due to the complexity of real-world environments. To address this, we propose RSO-YOLO, an enhanced model based on YOLOv12. First, the bidirectional feature pyramid network (BiFPN) and space-to-depth convolution (SPD-Conv) replace the original neck network. This design efficiently integrates multi-scale features while preserving fine-grained information during downsampling, thereby improving both computational efficiency and detection performance. Additionally, a detection head for the shallower feature layer P2 is incorporated, further boosting the model’s capability to detect small objects. Second, we propose the feature enhancement and compensation module (FECM), which strengthens features in visible regions and compensates for missing semantic information in occluded areas. This module improves detection accuracy and robustness under occlusion. Finally, we propose a lightweight global cross-dimensional coordinate detection head (GCCHead), built upon the global cross-dimensional coordinate module (GCCM). By grouping and synergistically enhancing features, this module addresses the challenge of balancing computational efficiency with detection performance. Experimental results demonstrate that on the SODA10M, BDD100K, and FLIR ADAS datasets, RSO-YOLO achieves mAP@0.5 improvements of 8.0%, 10.7%, and 7.2%, respectively, compared to YOLOv12. Meanwhile, the number of parameters is reduced by 15.4%, and model complexity decreases by 20%. In summary, RSO-YOLO attains higher detection accuracy while reducing parameters and computational complexity, highlighting its strong potential for practical autonomous driving applications.
- New
- Research Article
- 10.1016/j.bspc.2025.108064
- Nov 1, 2025
- Biomedical Signal Processing and Control
- Zhi Wang + 5 more
ISFPN: A medical detection model for ischemic stroke based on feature pyramid network and Swin Transformer
- New
- Research Article
- 10.1016/j.jag.2025.104884
- Nov 1, 2025
- International Journal of Applied Earth Observation and Geoinformation
- Libo Wang + 5 more
PyramidMamba: Rethinking pyramid feature fusion with selective space state model for semantic segmentation of remote sensing imagery
- New
- Research Article
- 10.63367/199115992025103605009
- Oct 31, 2025
- Journal of Computers
- Hsiao-Yu Wang + 2 more
Steel surface defect detection is critical to quality control, yet performance often degrades across datasets due to variations in imaging conditions, resolutions, and annotation styles. We propose a deep learning framework that combines multi-scale feature fusion, metric learning, and few-shot classification to improve cross-dataset robustness under scarce labels. A ResNet-50 backbone with a Feature Pyramid Network captures both fine-grained and high-level patterns across scales. On top of these features, a triplet-loss–based metric learning module enforces intra-class compactness and inter-class separability, mitigating domain shift. To handle rare or newly emerging defects, we employ a prototypical classifier that computes class prototypes from few labeled samples, enabling fast adaptation without extensive retraining. Evaluations on the NEU and Severstal datasets demonstrate superior generalization compared with Faster R-CNN and YOLO-series baselines, achieving higher mean Average Precision (mAP) and faster convergence. In few-shot settings (e.g., 1/5/10-shot), our approach maintains balanced precision–recall and substantially narrows the gap to full-data performance. These results indicate that integrating multi-scale representation learning with discriminative metric embeddings and prototype-based inference provides a scalable, data-efficient solution for reliable steel defect detection across heterogeneous production environments.
- New
- Research Article
- 10.1038/s42004-025-01718-5
- Oct 31, 2025
- Communications Chemistry
- Rajan Gyawali + 2 more
Cryo-electron microscopy (cryo-EM) is a key technology for determining the structures of proteins, particularly large protein complexes. However, automatically building high-accuracy protein structures from cryo-EM density maps remains a crucial challenge. In this work, we introduce MICA, a fully automatic and multimodal deep learning approach combining cryo-EM density maps with AlphaFold3-predicted structures at both input and output levels to improve cryo-EM protein structure modeling. It first uses a multi-task encoder-decoder architecture with a feature pyramid network to predict backbone atoms, Cα atoms, and amino acid types from both cryo-EM maps and AlphaFold3-predicted structures, which are used to build an initial backbone model. This model is further refined using AlphaFold3-predicted structures and density maps to build final atomic structures. MICA significantly outperforms other state-of-the-art deep learning methods in terms of both modeling accuracy and completeness, and is robust to protein size and map resolution. Additionally, it builds high-accuracy structural models with an average template-based modeling score (TM-score) of 0.93 from recently released high-resolution cryo-EM density maps, showing it can be used for real-world, automated, accurate protein structure determination.
- New
- Research Article
- 10.1038/s41598-025-21756-2
- Oct 30, 2025
- Scientific Reports
- Dejin Zhao + 9 more
Large-aperture optics play a pivotal role in high-performance optical systems, and the presence of micro-defects on their surfaces can significantly degrade system performance and reliability. Traditional methods for detecting these defects face challenges due to their small size, multiplicity, and complexity. This paper proposes an improved Mamba-based defect classification method, DC-Mamba, specifically designed for detecting surface micro-defects on large-aperture optics. DC-Mamba replaces the original model’s scanning mechanism and neck structure with a multi-axis interactive 2D Selective Scan (MISS2D) and a Multi-axis Interactive Feature Pyramid Network (MIFPN), achieving coordinated modeling of positional information, hierarchical structures, spatial relationships, and channel-wise features in the input feature maps. Evaluation on the NEU-DET dataset demonstrates that DC-Mamba, with MACSA, increases the AP from 41.7% to 45.7% and AP50 from 72.2% to 74.7%, compared to the original VMamba model. Furthermore, DC-Mamba achieves an AP of 64.3% on our self-made optic surface micro-defect (OSMD) dataset. By effectively distinguishing micro-defects from interference points, DC-Mamba provides a robust solution for intelligent defect detection on large-aperture optics surfaces.
- New
- Research Article
- 10.3390/app152111507
- Oct 28, 2025
- Applied Sciences
- Eun Seok Jun + 2 more
Accurate angular alignment of wafers is essential in ion implantation to prevent channeling effects that degrade device performance. This study proposes a real-time notch-angle-detection system based on you only look once version 8 with oriented bounding boxes (YOLOv8-OBB). The proposed method compares YOLOv8 and YOLOv8-OBB, demonstrating the superiority of the latter in accurately capturing rotational features. To enhance detection performance, hyperparameters—including initial learning rate (Lr0), weight decay, and optimizer—are optimized using an one factor at a time (OFAT) approach followed by grid search. Architectural improvements, including spatial pyramid pooling fast with large selective kernel attention (SPPF_LSKA), a bidirectional feature pyramid network (BiFPN), and a high-resolution detection head (P2 head), are incorporated to improve small-object detection. Furthermore, a gradual unfreezing strategy is employed to support more effective and stable transfer learning. The final system is evaluated over 100 training epochs and tracked up to 5000 epochs to verify long-term stability. Compared to baseline models, it achieves higher accuracy and robustness in angle-sensitive scenarios, offering a reliable and scalable solution for high-precision wafer-notch detection in semiconductor manufacturing.
- New
- Research Article
- 10.1088/1361-6501/ae1858
- Oct 28, 2025
- Measurement Science and Technology
- Zhiqin Zhang + 5 more
Abstract Insulators are critical components in transmission lines. Common defects, such as structural loss of the insulator caused by spontaneous rupture, breakage, and fouling can lead to short circuits and tripping faults, posing serious threats to power grid stability and the safety of the power supply. However, in practical applications, insulator defect detection faces several challenges, including small target sizes, insufficient representation of multiscale features, complex backgrounds, and imbalanced datasets with a limited number of defective samples. Traditional detection methods often struggle with missed detections of small targets and lack robustness in scenarios with large-scale variations and complex environments. To address these issues, this paper proposes an enhanced detection model based on YOLOv8s. The model introduces an Iterative Attentional Feature Fusion (iAFF) module to optimize multiscale feature representation and incorporates a Generalized Dynamic Feature Pyramid Network (GDFPN) to improve feature retention for small target detection, thereby enhancing robustness in complex backgrounds. Additionally, to mitigate the problem of limited defective sample data, the Stable Diffusion generative model is utilized to augment the dataset, effectively improving detection performance in small-sample scenarios. Experimental results demonstrate that the proposed method significantly outperforms the original YOLOv8s model in terms of recall, accuracy, and precision on the insulator defect dataset. The model exhibits strong detection capabilities and generalization performance, making it well-suited for real-world challenges such as small targets, multiscale variation, and complex backgrounds.
- New
- Research Article
- 10.3390/sym17111807
- Oct 27, 2025
- Symmetry
- Hengqi Hu + 6 more
Cardiac medical image segmentation can advance healthcare and embedded vision systems. In this paper, a symmetric semantic segmentation architecture for cardiac magnetic resonance (MR) images based on a symmetric multiscale detail-guided attention network is presented. Detailed information and multiscale attention maps can be exploited more efficiently in this model. A symmetric encoder and decoder are used to generate high-dimensional semantic feature maps and segmentation masks, respectively. First, a series of densely connected residual blocks is introduced for extracting high-dimensional semantic features. Second, an asymmetric detail-guided module is proposed. In this module, a feature pyramid is used to extract detailed information and generate detailed feature maps as part of the detail guidance of the model during the training phase, which are used to extract deep features of multiscale information and calculate a detail loss with specific encoder semantic features. Third, a series of multiscale upsampling attention blocks symmetrical to the encoder is introduced in the decoder of the model. For each upsampling attention block, feature fusion is first performed on the previous-level low-resolution features and the symmetric skip connections of the same layer, and then spatial and channel attention are used to enhance the features. Image gradients of the input images are also introduced at the end of the decoder. Finally, the predicted segmentation masks are obtained by calculating a detail loss and a segmentation loss. Our method demonstrates outstanding performance on the public cardiac MR image dataset, which can achieve significant results for endocardial and epicardial segmentation of the left ventricle (LV).
- New
- Research Article
- 10.3389/fpls.2025.1644271
- Oct 27, 2025
- Frontiers in Plant Science
- Jinghuan Hu + 2 more
In order to overcome the key challenges associated with detecting tomato leaf disease in complex agricultural environments, such as leaf occlusion, variation in lesion size and light interference, this study presents a lightweight detection model called ToMASD. This model integrates multi-scale feature decoupling and an adaptive alignment mechanism. The model innovatively comprises a dual-branch adaptive alignment module (TAAM) that achieves cross-scale lesion semantic alignment via a dynamic feature pyramid, a local context-aware gated unit (Faster-GLUDet) that uses a spatial attention mechanism to suppress background noise interference, and a multi-scale decoupling detection head (MDH) that balances the detection accuracy of small and diffuse lesions. On a dataset containing six types of disease under various weather conditions, ToMASD achieves an average precision of 84.3%,.by a margin of 4.7% to 12.1% over thirteen mainstream models. The computational load is compressed to 7.1 GFLOPs. Through the introduction of a transfer learning paradigm, the pre-trained weights of the tomato disease detection model can be transferred to common bean and potato detection tasks. Through domain adaptation layers and adversarial feature decoupling strategies, the domain shift problem is overcome, achieving an average precision of 92.7% on the target crop test set. False detection rates in foggy and strong light conditions are controlled at 6.3% and 9.8%, respectively. This study achieves dual breakthroughs in terms of both high-precision detection in complex scenarios and the cross-crop generalization ability of lightweight models. It provides a new paradigm for universal agricultural disease monitoring systems that can be deployed at the edge.
- New
- Research Article
- 10.1080/15481603.2025.2565866
- Oct 25, 2025
- GIScience & Remote Sensing
- Zhifu Zhu + 6 more
Generative adversarial networks (GANs) possess powerful image translation capabilities. They can transform images acquired from different sensors into a unified domain, effectively mitigating the incomparability problem caused by imaging discrepancies in multimodal remote sensing change detection (CD). However, existing approaches predominantly emphasize domain unification while neglecting the loss of fine-grained features inherent in the translation process, consequently compromising both image translation quality and CD accuracy. To overcome these limitations, we propose a novel texture and structure interaction guided GAN (TSIG-GAN). This network establishes interactive guidance between image texture and structural features through a carefully designed dual-stream cross encoder-decoder architecture, enabling in-depth mining of fine-grained features and significantly improving the fidelity of translated images. Furthermore, to address the spatial scale diversity and complexity of remote sensing images, we develop a multi-scale adaptive feature pyramid (MAFP) module and a contextual semantic interaction guidance (CSIG) mechanism, aiming to further strengthen the model's robust representation of fine-grained features across multiple scales and complex scenes. Specifically, the MAFP module effectively captures spatial details of targets at different resolutions by dynamically integrating multi-scale features, thereby preventing detail loss in small objects due to scale discrepancies. The CSIG mechanism achieves deep interaction between texture and structural features at the contextual semantic level, further promoting their mutual cooperation, thereby enhancing the consistency of fine-grained features representation and semantic integrity in complex scenes. Finally, the translated fine-grained images are fed into a custom CD network to extract changes. To evaluate the effectiveness of the proposed method, we conducted systematic experiments on five representative real-world datasets and performed comparative analysis with sixteen state-of-the-art multimodal CD methods. The experimental results demonstrate that TSIG-GAN achieves significant improvements in both image translation and CD performance, exhibiting superior fine-grained restoration capability and change identification capability.
- New
- Research Article
- 10.3390/pr13113416
- Oct 24, 2025
- Processes
- Xiao Ji + 3 more
Real-time and accurate identification of the blast furnace (BF) condition is essential for maintaining stability and improving energy efficiency in steelmaking. However, the harsh environment inside the BF makes direct acquisition of the BF condition extremely difficult. To address this challenge, this study proposes an online BF condition recognition method based on spatiotemporal texture feature coupling and diffusion networks (STFC-DN). The method employs a multi-domain Swin-Transformer module (MDSTM) combined with wavelet decomposition and channel attention to extract the gas flow region. A temporal feature pyramid network module (T-FPNM) is then used to capture both the global and local spatiotemporal characteristics of this region. Heuristic clustering and an idempotent generative network (IGN) are introduced to obtain standardized BF condition features, enabling intelligent classification through multi-metric similarity analysis. Experimental results show that the proposed STFC-DN achieves an average accuracy exceeding 98% when identifying four BF conditions: normal, hanging, oblique stockline, and collapsing, with an inference speed of approximately 28 FPS. This approach demonstrates both high accuracy and real-time capability, showing strong potential for advancing the intelligent and sustainable development of the steel industry.
- New
- Research Article
- 10.1038/s41598-025-21100-8
- Oct 24, 2025
- Scientific Reports
- Shiqiang Deng + 5 more
Accurate real-time heat release rate (HRR) measurement is critical in fire safety engineering. This study proposes a cascaded deep learning framework integrating visual flame detection and thermodynamic analysis. The detection module uses an enhanced YOLOv8n with efficient channel attention (ECA) and bidirectional feature pyramid network (BiFPN), achieving 95.2% precision and 88.3% recall. For HRR quantification, a parameter-efficient dual-branch CNN (5.2 M) processes spatial and frequency-domain flame features, showing superior performance (R2 = 0.976)—58.5%/78.2% fewer parameters than VGG16/ResNet50 and lower MAE than Vision Transformers (32.47 vs. 41.98). Validated on the NIST fire database, the framework demonstrates robust performance across flame growth and decay phases, with deviations during peak HRR. The cascaded design ensures efficiency by activating HRR analysis only post-detection, which reduces false alarms to a certain extent and establishing a new non-contact fire risk assessment paradigm.
- New
- Research Article
- 10.1038/s41598-025-21837-2
- Oct 23, 2025
- Scientific Reports
- Na Shi + 8 more
It is very challenging to accurately and timely detect small object containing dozens of pixels from infrared images. Compared with the complex background in infrared images taken by low-altitude drones, a framework is designed to learn a strong feature representation separating the object from the background, which usually leads to a large computational amount. In this paper, we proposed a Super Mamba (SMamba) framework for UAV infrared small object detection, which performs deep learning of nonlinear complex data. Our SMamba framework performs high resolution object detection on multi-scale objects, considering both detection accuracy and computational cost. First, the Receptive Field Attention Convolution (RFAConv) is used into the backbone network and replaced the commonly convolution, and the multi-scale features is adjusted through the dynamic receptive field to optimize the computing efficiency. Furthermore, the Spatial Attention Mechanism (SAM) and Squeeze-Excitation (SE) are added to the State Space Model (SSM) to achieve multi-scale and multi-feature extraction for small object. Moreover, in the neck, the Feature Enhancement Module (FEM) is introduced to Bidirectional Feature Pyramid Network (BiFPN) can enhance the local context information of small objects and improve the detection efficiency. The experimental results show that Super Mamba achieved more than 92% accuracy on VEDAI dataset (in terms of mAP@ 0.5), which is more than 20% higher than the existing large models such as Yolov5, Yolov8, and Yolov11. The pytorch code is available at: https://github.com/wolfololo/Super-Mamba-A-Framework-for-Small-Object-Detection-with-Enhanced-Detection.
- New
- Research Article
- 10.3390/s25216521
- Oct 23, 2025
- Sensors
- Zonghang Li + 7 more
The liquid reservoir is a critical component of the automotive air conditioning system, while weld seams on its surface may exhibit different types of defects with various shapes and scales, meaning traditional detection methods struggle to detect them effectively. In this article, we propose a YOLOv8s-based algorithm to detect liquid reservoir weld defects. In order to improve feature fusion within the neck and enhance the model’s capacity to detect defects showing substantial size variations, the neck is optimized through the integration of the improved Reparameterized Generalized Feature Pyramid Network (RepGFPN) and the addition of a small-object detection head. To further improve the capacity of identifying complex defects, the Spatial Pyramid Pooling Fast (SPPF) module in YOLOv8s is substituted with Focal Modulation Networks (FocalNets). Additionally, the Cascaded Group Attention (CGA) mechanism is incorporated into the improved neck to minimize the propagation of redundant feature information. Experimental results indicate that the improved YOLOv8s achieves a 6.3% improvement in mAP@0.5 and a 4.3% improvement in mAP@0.5:0.95 compared to the original model. The AP value for detecting craters, porosity, undercuts, and lack of fusion defects improves by 3.9%, 13.5%, 5.0%, and 2.5%, respectively. We conducted comparative experiments against other state-of-the-art models on the liquid reservoir weld dataset and the steel pipe weld defect dataset, and the results show that our model has outstanding detection performance.
- New
- Research Article
- 10.62762/tetai.2025.654854
- Oct 23, 2025
- ICCK Transactions on Emerging Topics in Artificial Intelligence
- Chandra Sekhar Koppireddy + 2 more
Object detection is a fundamental problem in computer vision, with applications spanning self-driving cars, surveillance systems, medical imaging, robotics, and smart cities. Among the myriad of algorithms developed for this task, the You Only Look Once (YOLO) family stands out for its ability to perform real-time and accurate object detection. This article provides a comprehensive analysis of the YOLO algorithm series, from YOLOv1 to YOLOv8, evaluating them across key performance metrics, including precision, recall, mean Average Precision (mAP), frames per second (FPS), and overall effectiveness. Unlike traditional two-stage detectors such as R-CNN, YOLO formulates object detection as a single regression problem: a single pass over the image simultaneously predicts bounding boxes and class probabilities. This end-to-end design enables YOLO to achieve high speed while maintaining a competitive accuracy-efficiency trade-off. We examine architectural innovations across YOLO versions, including batch normalization, anchor boxes, residual blocks, feature pyramid networks, and attention mechanisms, and discuss their impact on performance. Lightweight models (e.g., YOLOv5-Nano, YOLOv8-Small) are explored with a focus on their suitability for mobile and embedded systems, highlighting YOLO’s adaptability to resource-constrained environments. Challenges such as small object detection, occlusion, and domain-specific tuning are also addressed. This article serves as a practical guide for researchers, developers, and practitioners aiming to leverage YOLO for real-world object detection tasks.
- New
- Research Article
- 10.3389/fphy.2025.1650714
- Oct 23, 2025
- Frontiers in Physics
- Keya Yuan + 1 more
The precise extraction of laser spot edges plays a fundamental role in optical measurement systems, yet traditional methods struggle with noise interference and varying spot characteristics. Existing approaches face significant challenges in achieving robust subpixel accuracy across diverse experimental conditions, particularly for irregular spots and low signal-to-noise scenarios. This article presents a novel multi-scale adaptive convolution framework that integrates three key innovations: (1) dynamic kernel adjustment based on local intensity gradients, (2) hierarchical feature pyramid architecture combining spatial details with semantic features, and (3) subpixel localization through Gaussian surface fitting and gradient extremum analysis. Extensive experiments demonstrate the method’s superior performance, achieving 0.12-pixel root mean square error (RMSE) on standard Gaussian beams (vs. 0.38 for Canny), maintaining 0.15-pixel accuracy with aberrated spots, and showing remarkable robustness at 5 dB SNR (0.28-pixel RMSE). The results establish that our hybrid approach successfully bridges physical modeling with data-driven adaptation, delivering unprecedented precision (0.91 temporal–spatial consistency) for laser-based applications ranging from industrial metrology to biomedical imaging. The ablation studies further confirm the critical importance of both multi-scale adaptation (61% accuracy drop when removed) and analytical modeling (0.842 F1-score without Gaussian fitting), providing valuable insights for future edge detection research.