Abstract

Local appearance representation of tracking objects plays a key role in robust visual tracking. While, holistic appearance representation mostly dominates the feature extraction model when describing the target appearance in existing DNN-based trackers, leading to high sensitivity to appearance variation, such as rotation, appearance change, and partial occlusion. We introduce a spatial transformer local part detector, which provides more robust local representation for tracking object and combine the existing Siamese networks for visual tracking simultaneously. To this end, we explicit the spatial transformer networks in the local part detector to detect the discriminative region of tracking objects. The local part detector further passes the detection results to the local cropping module, which comes up with the local structure for the region of interest. Differentiable operations can perform end-to-end learning during local structure construction. We implement the tracking procedure by matching the combined local pattern between candidates and templates in Siamese networks. We extensively prove the effectiveness of the proposed method through the ablation studies of the tracking benchmark, including OTB-2015, VOT-2018, and UAV123.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call