Abstract

Recently, Siamese trackers have achieved remarkable tracking performance. However, challenges such as accurate feature representation of targets with different spatial regions and the utilization of diverse target temporal states still need to be addressed. Here, we proposed a Siamese network with spatio-temporal awareness called SiamST. The standard square convolutional kernels and the single feature matching operation hardly represent the targets with different shapes accurately. Therefore, we designed a refined region fusion module that combines multiple convolutional kernels to fit targets with different aspect ratios. Furthermore, we proposed a multi-granularity matching module to obtain more robust feature matching results by combining fine-grained and coarse-grained matching results. However, most existing Siamese trackers do not adequately employ target temporal states. They usually only update the templates, which automatically causes motion information loss. Therefore, we built dynamic templates by screening high-quality samples to describe the target appearance changes accurately. In addition, we designed a trend guidance module to adjust the location prior constraint appropriately to match the tracking results to the target's motion trajectory. Extensive experimental results on eight tracking benchmarks demonstrate the competitive performance of SiamST compared to many advanced trackers.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.