Visual object tracking is widely adopted to unmanned aerial vehicle (UAV)-related applications, which demand reliable tracking precision and real-time performance. However, UAV trackers are highly susceptible to adversarial attacks, while research on developing effective adversarial defense methods for UAV tracking remains limited. To tackle these challenges, we propose CMDN, a novel pre-processing defense network that effectively purifies adversarial perturbations by reconstructing video frames. This network learns robust visual representations from video frames, guided by meaningful features from both the search region and the template. Comprehensive experiments on three benchmarks demonstrate that CMDN is capable of enhancing a UAV tracker’s adversarial robustness in both adaptive and non-adaptive attack scenarios. In addition, CMDN maintains stable defense effectiveness when transferred to heterogeneous trackers. Real-world tests on the UAV platform also validate its reliable defense effectiveness and real-time performance, with CMDN achieving 27 FPS on NVIDIA Jetson Orin 16 GB (25 W mode).
Read full abstract