Abstract
In urban environment monitoring, visual tracking on unmanned aerial vehicles (UAVs) can produce more applications owing to the inherent advantages, but it also brings new challenges for existing visual tracking approaches (such as complex background clutters, rotation, fast motion, small objects, and realtime issues due to camera motion and viewpoint changes). Based on the Siamese network, tracking can be conducted efficiently in recent UAV datasets. Unfortunately, the learned convolutional neural network (CNN) features are not discriminative when identifying the target from the background/clutter, In particular for the distractor, and cannot capture the appearance variations temporally. Additionally, occlusion and disappearance are also reasons for tracking failure. In this paper, a semantic subspace module is designed to be integrated into the Siamese network tracker to encode the local fine-grained details of the target for UAV tracking. More specifically, the target’s semantic subspace is learned online to adapt to the target in the temporal domain. Additionally, the pixel-wise response of the semantic subspace can be used to detect occlusion and disappearance of the target, and this enables reasonable updating to relieve model drifting. Substantial experiments conducted on challenging UAV benchmarks illustrate that the proposed method can obtain competitive results in both accuracy and efficiency when they are applied to UAV videos.
Highlights
IntroductionClutter4.3.4. Qualitative EvaluationQualitative evaluation of the proposed algorithm with the other algorithms (including SiamFC [20], EDCF [19], CCOT [47], and HCF [37] in the UAVDT [7] dataset and the results are shown in Figure 10. Next, we will analyze the effects of different trackers in typical videos which contain challenges, such as light change, scale change, background clutter, and rotation. #000117 #000180 #000208 #000253 #000086 #000146 #000307 #000004 #000016
Like the work [20], the network is a pre-trained Alexnet network [10], which is trained offline to measure the similarity between the template and the search region based on the ILSVRC2015 [9], while a filter was followed by the Alexnet network to learn the target’s semantic subspace online according to the dataset gathered when tracking is on-the-fly
From 10 hours of raw videos, the UAVDT [7] database selected about 80,000 representative frames that were fully annotated with bounding boxes, as well as up to 14 kinds of attributes for the single object tracking task
Summary
Clutter4.3.4. Qualitative EvaluationQualitative evaluation of the proposed algorithm with the other algorithms (including SiamFC [20], EDCF [19], CCOT [47], and HCF [37] in the UAVDT [7] dataset and the results are shown in Figure 10. Next, we will analyze the effects of different trackers in typical videos which contain challenges, such as light change, scale change, background clutter, and rotation. #000117 #000180 #000208 #000253 #000086 #000146 #000307 #000004 #000016
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.