Abstract

Visual tracking in aerial videos is a challenging task in computer vision and remote sensing technologies due to appearance variation difficulties. Appearance variations are caused by camera and target motion, low resolution noisy images, scale changes, and pose variations. Various approaches have been proposed to deal with appearance variation difficulties in aerial videos, and amongst these methods, the spatiotemporal saliency detection approach reported promising results in the context of moving target detection. However, it is not accurate for moving target detection when visual tracking is performed under appearance variations. In this study, a visual tracking method is proposed based on spatiotemporal saliency and discriminative online learning methods to deal with appearance variations difficulties. Temporal saliency is used to represent moving target regions, and it was extracted based on the frame difference with Sauvola local adaptive thresholding algorithms. The spatial saliency is used to represent the target appearance details in candidate moving regions. SLIC superpixel segmentation, color, and moment features can be used to compute feature uniqueness and spatial compactness of saliency measurements to detect spatial saliency. It is a time consuming process, which prompted the development of a parallel algorithm to optimize and distribute the saliency detection processes that are loaded into the multi-processors. Spatiotemporal saliency is then obtained by combining the temporal and spatial saliencies to represent moving targets. Finally, a discriminative online learning algorithm was applied to generate a sample model based on spatiotemporal saliency. This sample model is then incrementally updated to detect the target in appearance variation conditions. Experiments conducted on the VIVID dataset demonstrated that the proposed visual tracking method is effective and is computationally efficient compared to state-of-the-art methods.

Highlights

  • Visual tracking is an active research topic in computer vision

  • This paper focuses on spatiotemporal saliency detection to deal with the appearance variation difficulties in aerial videos, including a proposed spatial saliency detection method for visual target representation

  • The videos are collected from VIVID dataset [46], and report appearance variation difficulties, such as complicated background, illumination changes, scale changes, and pose variations

Read more

Summary

Introduction

Visual tracking is an active research topic in computer vision. It has been used for many applications, such as activity recognition, surveillance, robotics, and human-computer interaction [1]. Visual tracking algorithms and systems often fail on aerial videos The sources of this failure include appearance variations in the target image caused by relative camera and target motion and inadequate spatial resolution or noise, scale changes, and pose variations [3,4,5]. An efficient visual representation is crucial to describe the target in the scene and generate a sample model [4,8]. The proposed method is able to detect the moving targets efficiently in noisy background and longterm occlusions. Relative distance change (RDC) measure is proposed to distinguish the target from background scene, which is invariant to image rotation, translation, and scaling. The details of the proposed method are detailed in the following subsections

Methods
Results
Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.