Abstract

Adversarial attacks in visual object tracking aims to fool trackers via injecting invisible perturbations for the video frames. Most adversarial methods advocate generating perturbations for each video frame, but frequent attacks may increase the computational load and the risk of exposure. Unfortunately, less works are about only attacking the initial frame and their attack effects are insufficient. To tackle this, we focus on the initialization phase of tracking and propose an only once attack framework. It can effectively fool the tracker via only generating invisible perturbations for the initial template, rather than each frame. Specifically, considering the tracking mechanism of the Siamese-based trackers, we design the minimum score-based and the minimum IoU-based loss functions. Both of them are used for training the UNet-based perturbation generator instead of the tracker, achieving the non-targeted attack. Additionally, we propose the location and direction offsets as the base attacks of sophisticated targeted attack. Combined with the two basic attacks, the tracker can be easily hijacked to move towards the fake target predefined by users. Extensive experimental results demonstrate that our only once attack framework costs the least number of attacks yet achieves better attack effect, with the maximum drop of 68.7%. The transferability experiments illustrate that our attack framework with good generalization ability can be directly applicable to the CNN-based, Siamese-based, deep discriminative-based and Transformer-based trackers, without retraining.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call