Abstract

AbstractThe use of deep neural networks has revolutionised object tracking tasks, and Siamese trackers have emerged as a prominent technique for this purpose. Existing Siamese trackers use a fixed template or template updating technique, but it is prone to overfitting, lacks the capacity to exploit global temporal sequences, and cannot utilise multi‐layer features. As a result, it is challenging to deal with dramatic appearance changes in complicated scenarios. Siamese trackers also struggle to learn background information, which impairs their discriminative ability. Hence, two transformer‐based modules, the Spatio‐Temporal Fusion (ST) module and the Discriminative Enhancement (DE) module, are proposed to improve the performance of Siamese trackers. The ST module leverages cross‐attention to accumulate global temporal cues and generates an attention matrix with ST similarity to enhance the template's adaptability to changes in target appearance. The DE module associates semantically similar points from the template and search area, thereby generating a learnable discriminative mask to enhance the discriminative ability of the Siamese trackers. In addition, a Multi‐Layer ST module (ST + ML) was constructed, which can be integrated into Siamese trackers based on multi‐layer cross‐correlation for further improvement. The authors evaluate the proposed modules on four public datasets and show comparative performance compared to existing Siamese trackers.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call