Abstract

Self Attention has shown the excellent performance in tracking due to its global modeling capability. However, it brings two challenges: First, its global receptive field has less attention on local structure and inter-channel associations, which limits the semantics to distinguish objects and backgrounds; Second, its feature fusion with linear process cannot avoid the interference of non-target semantic objects. To solve the above issues, this paper proposes a robust tracking method named GdaTFT by defining the Global Dilated Attention (GDA) and Target Focusing Network (TFN). The GDA provides a new global semantics modeling approach to enhance the semantic objects while eliminating the background. It is defined via the local focusing module, dilated attention and channel adaption module. Thus, it promotes semantics by focusing local key information, building long-range dependencies and enhancing the semantics of channels. Subsequently, to distinguish the target and non-target objects both with rich semantics, the TFN is proposed to accurately focus the target region. Different from the present feature fusion, it uses the template as the query to build a point-to-point correlation between the template and search region, and finally achieves part-level augmentation of target feature in the search region. Thus, the TFN efficiently augments the target embedding while weakening the non-target objects. Experiments on challenging benchmarks (LaSOT, TrackingNet, GOT-10k, OTB-100) demonstrate that the GdaTFT outperforms many state-of-the-art trackers and achieves leading performance. Code will be available.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call