Abstract

Object detection is a pivotal task for many unmanned aerial vehicle (UAV) applications. Compared to general scenes, the objects in aerial images are typically much smaller. For this reason, most general object detectors suffer from two critical challenges while dealing with aerial images: 1) The widely exploited Feature Pyramid Network works by integrating high-level features to lower levels progressively. However, this manner does not transfer equivalent information from each level of backbone network to the generated features, and the shared detection head faces an unbalanced sources of information flow, damaging the detection accuracy. 2) Up-sampling is commonly used to expand feature resolution for feature fusion or feature aggregation. However, existing up-sampling methods are ineffective to reconstruct high resolution feature maps. To address these two challenges, two works are proposed: 1) An up-scale feature aggregation framework that fully utilizes multi-scale complementary information, and 2) a novel up-sampling method that further improve detection accuracy. These two proposals are integrated into an end-to-end single-stage object detector namely HawkNet. Extensive experiments are conducted on VisDrone-DET2018, UAVDT and DIOR datasets. Compared to the RetinaNet baseline, our HawkNet achieves absolute gains of 6.0%, 1.2% and 5.9% in average precision (AP) on VisDrone-DET2018, UAVDT and DIOR datasets, respectively. For a 800 × 1333 input on the UAVDT dataset, HawkNet with ResNet-50 backbone surpasses existing methods for single-scale inference and achieves the best performance (37.4 AP), while operating at 10.6 frames per second on a single Nvidia GTX 1080Ti GPU.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call