Abstract

Recently, Siamese tracking framework has been widely used in the visual tracking community. Siamese trackers usually use Cross-Correlation to aggregate templates and search information, so they realize the encoding of target information. However, the previous Cross-Correlation methods either ignore the object channel semantic information or ignore the object’s local information. This seriously limits the representation of embedded correlation features and reduces the performance of the trackers. In this paper, to solve these problems, we propose an effective correlation information mixer for visual target tracking. We design an information fusion network to efficiently aggregate template features and search features. In the information fusion network, we use Depthwise Cross-Correlation and Pointwise Cross-Correlation to extract the channel semantic information and local information of the object respectively, and use the correlation information mixer to fully fuse the two correlation maps to achieve the optimal target information encoding. Extensive experimental results show that our tracker achieves competitive performance compared with other state-of-the-art trackers on four benchmarks, including OTB, VOT, UAV123, and LaSOT.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call