Abstract

We propose SiamGauss, a Siamese region proposal network with a Gaussian head for single-target visual object tracking for aerial benchmarks. Visual tracking in aerial videos faces unique challenges due to the large field of view resulting in small size objects, similar looking objects (confusers) in close proximity, occlusions, and fast motion due to simultaneous object and camera motion. In Siamese tracking, a cross-correlation ration is performed in the embedding space to obtain a similarity map of the target within a search frame, which is then used to localize the target. The proposed Gaussian head helps suppress the activation produced in the similarity map on confusers present in the search frame during training while boosting the confidence on the target. This activation suppression improves the confuser awareness of our tracker. In addition, improving the activation on the target helps maintain tracking consistency in fast motion. Our proposed Gaussian head is only applied during training and introduces no additional computational overhead during inference while tracking. Thus, SiamGauss achieves fast runtime performance. We evaluate our method on multiple aerial benchmarks showing that SiamGauss performs favorably with state-of-the-art trackers while rating at a frame rate of 96 frames per second.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.