Current mainstream computer vision algorithms focus on designing suitable network architectures and loss functions to fit training data. However, the accuracy of small object detection remains lower than for other scales, and the design of convolution operators limits the model’s performance. For UAV small object detection, standard convolutions, due to their fixed kernel size, cannot adaptively capture small object spatial information. Many convolutional variants have scattered sampling points, leading to blurred boundaries and reduced accuracy. In response, we propose HawkEye Conv (HEConv), which utilizes stable sampling and dynamic offsets with random selection. By varying the convolution kernel design, HEConv reduces the accuracy gap between small and larger objects while offering multiple versions and plug-and-play capabilities. We also develop HawkEye Spatial Pyramid Pooling and Gradual Dynamic Feature Pyramid Network modules to validate HEConv. Experiments on the RFRB agricultural and VisDrone2019 urban datasets demonstrate that, compared to YOLOv10, our model improves AP50 by 11.9% and 6.2%, APS by 11.5% and 5%, and F1-score by 5% and 7%. Importantly, it enhances small object detection without sacrificing large object accuracy, thereby reducing the multi-scale performance gap.
Read full abstract