Abstract

Visual object tracking by Siamese networks has achieved favorable performance in accuracy and speed. However, the features used in Siamese networks have spatially redundant information, which increases computation and limits the discriminative ability of Siamese networks. Addressing this issue, we present a novel frequency-aware feature (FAF) method for robust visual object tracking in complex scenes. Unlike previous works, which select features from different channels or layers, the proposed method factorizes the feature map into multi-frequency and reduces the low-frequency information that is spatially redundant. By reducing the low-frequency map’s resolution, the computation is saved and the receptive field of the layer is also increased to obtain more discriminative information. To further improve the performance of the FAF, we design an innovative data-independent augmentation for object tracking to improve the discriminative ability of tracker, which enhanced linear representation among training samples by convex combinations of the images and tags. Finally, a joint judgment strategy is proposed to adjust the bounding box result that combines intersection-over-union (IoU) and classification scores to improve tracking accuracy. Extensive experiments on 5 challenging benchmarks demonstrate that our FAF method performs favorably against SOTA tracking methods while running around 45 frames per second.

Highlights

  • In recent years, visual object tracking as a fundamental problem in the computer vision field has been widely studied and applied to the unmanned vehicle, traffic surveillance, and intelligent transportation

  • Different from existing tracking methods select features by different layers or channels, we innovatively introduce frequency-aware features into object tracking, which can improve the model’s discrimination ability while reducing feature calculation

  • To comprehensive verify the efficiency of frequency-aware feature (FAF), extensive experiments are evaluated on 5 famous benchmarks, the results prove that the proposed FAF outperforms the state-of-the-art trackers while running at 45 fps

Read more

Summary

Introduction

Visual object tracking as a fundamental problem in the computer vision field has been widely studied and applied to the unmanned vehicle, traffic surveillance, and intelligent transportation. The tracking targets have changed from traditional vehicles, pedestrians, and other large objects to random, small objects in complex scenes (such as background clutter, illumination variation, scale variation, low resolution, occlusion, and fast motion), which are harder to predict. To address this issue, strong discriminative deep learning models have been introduced to design robust and real-time tracking methods in complex scenes. Input: the image M, the ground truth bound box P( x, y, w, h), the number of fusion samples N f us , the number of negative samples Nneg , the number of positive samples Npos , and interpolation strength parameter α.

Methods
Results
Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.