Abstract

Deep learning-based tracking methods have shown favorable performance on multiple benchmarks. However, most of these methods are not designed for real-time video surveillance systems due to the complex online optimization process. In this article, we propose a single-shot adversarial tracker (SAT) to efficiently locate objects of interest in surveillance videos. Specifically, we propose a lightweight convolutional neural network-based generator, which fuses multilayer feature maps to accurately generate the target probability map (TPM) for tracking. To more effectively train the generator, an adversarial learning framework is presented. During the online tracking stage, the learned TPM generator can be directly employed to generate the target probability map corresponding to the searching region in a single shot. The proposed SAT can lead to the average tracking speed of 212 FPS on a single GPU, while still achieving the favorable performance on several popular benchmarks. Furthermore, we also present a variant of SAT by considering both scale estimation and online updating in SAT, which achieves better accuracy than SAT while still maintaining very fast tracking speed (i.e., exceeding 100 FPS).

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call