Abstract

In urban environment monitoring, visual tracking on unmanned aerial vehicles (UAVs) can produce more applications owing to the inherent advantages, but it also brings new challenges for existing visual tracking approaches (such as complex background clutters, rotation, fast motion, small objects, and realtime issues due to camera motion and viewpoint changes). Based on the Siamese network, tracking can be conducted efficiently in recent UAV datasets. Unfortunately, the learned convolutional neural network (CNN) features are not discriminative when identifying the target from the background/clutter, In particular for the distractor, and cannot capture the appearance variations temporally. Additionally, occlusion and disappearance are also reasons for tracking failure. In this paper, a semantic subspace module is designed to be integrated into the Siamese network tracker to encode the local fine-grained details of the target for UAV tracking. More specifically, the target’s semantic subspace is learned online to adapt to the target in the temporal domain. Additionally, the pixel-wise response of the semantic subspace can be used to detect occlusion and disappearance of the target, and this enables reasonable updating to relieve model drifting. Substantial experiments conducted on challenging UAV benchmarks illustrate that the proposed method can obtain competitive results in both accuracy and efficiency when they are applied to UAV videos.

Highlights

  • IntroductionClutter4.3.4. Qualitative EvaluationQualitative evaluation of the proposed algorithm with the other algorithms (including SiamFC [20], EDCF [19], CCOT [47], and HCF [37] in the UAVDT [7] dataset and the results are shown in Figure 10. Next, we will analyze the effects of different trackers in typical videos which contain challenges, such as light change, scale change, background clutter, and rotation. #000117 #000180 #000208 #000253 #000086 #000146 #000307 #000004 #000016

  • Like the work [20], the network is a pre-trained Alexnet network [10], which is trained offline to measure the similarity between the template and the search region based on the ILSVRC2015 [9], while a filter was followed by the Alexnet network to learn the target’s semantic subspace online according to the dataset gathered when tracking is on-the-fly

  • From 10 hours of raw videos, the UAVDT [7] database selected about 80,000 representative frames that were fully annotated with bounding boxes, as well as up to 14 kinds of attributes for the single object tracking task

Read more

Summary

Introduction

Clutter4.3.4. Qualitative EvaluationQualitative evaluation of the proposed algorithm with the other algorithms (including SiamFC [20], EDCF [19], CCOT [47], and HCF [37] in the UAVDT [7] dataset and the results are shown in Figure 10. Next, we will analyze the effects of different trackers in typical videos which contain challenges, such as light change, scale change, background clutter, and rotation. #000117 #000180 #000208 #000253 #000086 #000146 #000307 #000004 #000016

Methods
Results
Discussion
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call