Abstract

Visual tracking is one of the most fundamental and active research topics in the field of computer vision with industrial applications. It has to solve 2 core problems, namely classification and state estimation. Most of the existing trackers utilize deep networks to extract the features of the object. Especially, Siamese based approaches have prevailed in tracking tasks, which generate labels for both positive and negative samples. However, these approaches introduce ambiguities and inaccurate semantic information at the same time, which may cause failure in classification. To address this problem, we present SiamRank by adding the sequential information of different samples in one image. We apply the proposed network to 2 backbones (AlexNet and GoogLeNet) to testify its general performance. Extensive experiments have been carried out on 7 popular benchmarks, including OTB100, LaSOT, GOT-10 K, TrackingNet, NFS, UAV123 and VOT2019, and our tracker achieves state-of-the-art results. Specifically, on both large-scale TrackingNet dataset and long-time LaSOT dataset, SiamRank surpasses the previous approaches with a relative gain of 10%, while running at 65 FPS.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call