Most of the existing trackers address the visual tracking problem by extracting an appearance template from the first frame, which is used to localize the target in the current frame. Unfortunately, they typically face the model degeneration challenge, which easily results in model drift and target loss. To address this issue, a novel Template Adjustment Siamese Network (TA-Siam) is proposed in this paper. The proposed framework TA-Siam consists of two simple subnetworks: The template adjustment subnetwork for feature extraction and the classification-regression subnetwork for bounding box prediction. The template adjustment module adaptively uses the feature of subsequent frames to adjust the current template. It makes the template adapt to the target appearance variation of long-term sequence and effectively overcomes model drift problem of Siamese networks. In order to reduce classification errors, the rhombus labels are proposed in our TA-Siam. For more efficient learning and faster convergence, our proposed tracker uses a more effective regression loss in the training process. Extensive experiments and comparisons with trackers are conducted on the challenging benchmarks including VOT2016, VOT2018, OTB50, OTB100, GOT-10K, and LaSOT. Our TA-Siam achieves state-of-the-art performance at the speed of 45 FPS.
Read full abstract