Abstract
Most prevalent trackers have the aid of more powerful appearance models, aiming to learn more discriminative deep representations for a reliable response map. Nevertheless, it is not a piece of cake to obtain an accurate response map on account of diverse challenges. What ismore, discriminative appearance model based trackers with an online update component require dynamic samples to update the target classifier, which inevitably learns the imprecise tracking results into the model, thus reducing its discriminant ability. To alleviate this matter, we propose a brand-new verification mechanism via Target Embedding Network, the intention of which is to learn general target embedding features offline for improving the similarity of the same target while dissimilarity of the different targets. In particular, we devise a simpler select strategy for negative samples and adopt the multiple triple loss to effectively train the network. Furthermore, we adopt a valid cosine similarity method to metric the target embedding features between the initial frame and current frame targets. According to the comparison results between the similarity score and piecewise thresholds, this method can retain the discriminant ability of the tracker by controlling the update of the sample memory and the learning rate. We replace the hard sample mining strategy used by recent SuperDiMP and conduct comprehensive experimental tests and analyses of our approach on six public datasets. Extensive experiments demonstrate that the discriminative ability of the model can be maintained with effect by using the proposed method, acquiring a superior performance against state-of-the-art trackers. The code, raw tracking results and trained models will be released at https://github.com/hexdjx/VisTrack.
Published Version
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have