Abstract
Existing object detection models are typically designed without considering the small-scale context, leading to significant challenges in detecting small objects within Unmanned Aerial Vehicle (UAV) scenes. Therefore, this paper aims to incorporate a novel hierarchical scale-aware module into the neck component of the classical YOLO architecture. This module hierarchically enhances the object features, progressing from small to large scales. Specifically, the proposed Small-Scale Awareness (SSA) module is designed to enhance features from small-scale objects, while the introduced Receptive Field Expansion (RFE) module is responsible for modeling contextual information in a way that expands the receptive field while maintaining feature diversity for large-scale objects. Additionally, in the backbone of our model, a Stack of Non-Linear Mapping (SNM) module is proposed, which utilizes deformable convolutions to fuse feature maps of diverse scales through a cascade of non-linear mapping units, to capture a wide range of contextual and discriminative information. The experimental results on the VisDrone dataset demonstrate that the proposed model outperforms the state-of-the-art models both on the mean Average Precision (mAP) and Average Precision 50 (AP50) metrics. The ablation studies have proved that the proposed modules are beneficial to improve the detection performance of objects in UAV scenes.
Published Version
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have