To address the issues of insufficient samples, limited scene diversity, missing perspectives, and low resolution in existing UAV-based pedestrian detection datasets, this paper proposes a novel UAV-based pedestrian detection benchmark dataset named the Novel Surveillance View (NSV). This dataset encompasses diverse scenes and pedestrian information captured from multiple perspectives, and introduces an innovative data mining approach that leverages tracking and optical flow information. This approach significantly improves data acquisition efficiency while ensuring annotation quality. Furthermore, an improved pedestrian detection method is proposed to overcome the performance degradation caused by significant perspective changes in top-down UAV views. Firstly, the View-Agnostic Decomposition (VAD) module decouples features into perspective-dependent and perspective-independent branches to enhance the model’s generalization ability to perspective variations. Secondly, the Deformable Conv-BN-SiLU (DCBS) module dynamically adjusts the receptive field shape to better adapt to the geometric deformations of pedestrians. Finally, the Context-Aware Pyramid Spatial Attention (CPSA) module integrates multi-scale features with attention mechanisms to address the challenge of drastic target scale variations. The experimental results demonstrate that the proposed method improves the mean Average Precision (mAP) by 9% on the NSV dataset, thereby validating that the approach effectively enhances pedestrian detection accuracy from UAV perspectives by optimizing perspective features.
Read full abstract