Abstract

Siamese trackers have recently achieved remarkable performance in visual tracking community. However, they mainly utilize features from a certain convolutional layer for feature representation, which is not discriminative enough for dealing with complicated scenarios, especially in presence of background distraction and high precise localization. To this end, we propose a novel hierarchical Siamese tracking network to address this issue, the presented network exploits multiple levels of convolutional layers cascaded from deeper layers to earlier layers for discriminative tracker performance. For taking advantage of deep and shallow features effectively, we design a pyramid feature fusion module to fuse them, thus the capability of the tracker is improved and can better distinguish the target from background. To further improve the robustness of the tracker, we design the location-aware prediction heads by defining the bounding box regression and cascading from high-level to low-level layers for the proposed hierarchical Siamese network, that maintains precise localization for tracking. Experimental results on visual tracking benchmarks, including VOT2018, VOT2019, OTB2015, UAV123, and LaSOT demonstrate that the proposed method achieves state-of-the-art results and runs at 34.2FPS, confirming its effectiveness and efficiency.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call