Traditional Re-Identification (Re-ID) schemes often rely on multiple cameras from the same perspective to search for targets. However, the collaboration between fixed cameras and unmanned aerial vehicles (UAVs) is gradually becoming a new trend in the surveillance field. Facing the significant perspective differences between fixed cameras and UAV cameras, the task of Re-ID is facing unprecedented challenges. In the setting of a single perspective, although significant advancements have been made in person Re-ID models, their performance markedly deteriorates when confronted with drastic viewpoint changes, such as transitions from aerial to ground-level perspectives. This degradation in performance is primarily attributed to the stark variations between viewpoints and the significant differences in subject posture and background across various perspectives. Existing methods focusing on learning local features have proven to be suboptimal in cross-perspective Re-ID tasks. The reason lies in the perspective distortion caused by the top-down viewpoint of drones, and the richer and more detailed texture information observed from a ground-level perspective, which leads to notable discrepancies in local features. To address this issue, the present study introduces a Multi-scale Across View Model (MAVM) that extracts features at various scales to generate a richer and more robust feature representation. Furthermore, we incorporate a Cross-View Alignment Module (AVAM) that fine-tunes the attention weights, optimizing the model’s response to critical areas such as the silhouette, attire textures, and other key features. This enhancement ensures high recognition accuracy even when subjects change posture and lighting conditions. Extensive experiments conducted on the public dataset AG-ReID have demonstrated the superiority of our proposed method, which significantly outperforms existing state-of-the-art techniques.
Read full abstract