Abstract

Recently, Siamese trackers have achieved remarkable tracking performance. However, challenges such as accurate feature representation of targets with different spatial regions and the utilization of diverse target temporal states still need to be addressed. Here, we proposed a Siamese network with spatio-temporal awareness called SiamST. The standard square convolutional kernels and the single feature matching operation hardly represent the targets with different shapes accurately. Therefore, we designed a refined region fusion module that combines multiple convolutional kernels to fit targets with different aspect ratios. Furthermore, we proposed a multi-granularity matching module to obtain more robust feature matching results by combining fine-grained and coarse-grained matching results. However, most existing Siamese trackers do not adequately employ target temporal states. They usually only update the templates, which automatically causes motion information loss. Therefore, we built dynamic templates by screening high-quality samples to describe the target appearance changes accurately. In addition, we designed a trend guidance module to adjust the location prior constraint appropriately to match the tracking results to the target's motion trajectory. Extensive experimental results on eight tracking benchmarks demonstrate the competitive performance of SiamST compared to many advanced trackers.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call