Visual object tracking has been a major research topic in the field of computer vision for many years. Object tracking aims to identify and localize objects of interest in subsequent frames, given the bounding box of the first frame. In addition, the object-tracking algorithms are also required to have robustness and real-time performance. These requirements create some unique challenges, which can easily become overfitting if given a very small training dataset of objects during offline training. On the other hand, if there are too many iterations in the model-optimization process during offline training or in the model-update process during online tracking, it will cause the problem of poor real-time performance. We address these problems by introducing a meta-learning method based on fast optimization. Our proposed tracking architecture mainly contains two parts, one is the base learner and the other is the meta learner. The base learner is primarily a target and background classifier, in addition, there is an object bounding box prediction regression network. The primary goal of a meta learner based on the transformer is to learn the representations used by the classifier. The accuracy of our proposed algorithm on OTB2015 and LaSOT is 0.930 and 0.688, respectively. Moreover, it performs well on VOT2018 and GOT-10k datasets. Combined with the comparative experiments on real-time performance, our algorithm is fast and robust.
Read full abstract