Abstract
Abstract To address the critical challenges of unmanned aerial vehicle infrared detection including dense occlusion, low foreground–background contrast, and complex background interference, we propose an enhanced Infrared_YOLO architecture based on ‘You Only Look Once version 11’ (YOLOv11) framework. The architecture incorporates a feature fusion pyramid network that synergistically combines focused and diffused semantic propagation to enhance multi-scale feature integration while boosting backbone throughput. A dynamic feature enhancement module employs local multi-path cooperative attention to dynamically optimize both offset fields and modulation masks in deformable convolutions, significantly improving geometric deformation modeling. An improved non-maximum suppression algorithm coupled with similarity-preserving knowledge distillation effectively reduces redundant detections and false positives while strengthening the model’s generalization capabilities across diverse operational scenarios. Experimental results demonstrate that the Infrared_YOLO algorithm exhibits consistently high generalization performance and robustness across multiple heterogeneous datasets compared to baseline models. The method exhibits superior robustness and generalization capabilities, providing an effective technical solution for post-disaster rescue, traffic monitoring, and urban planning applications.
Published Version
Join us for a 30 min session where you can share your feedback and ask us any queries you have