Abstract
Single visual object tracking from an unmanned aerial vehicle (UAV) poses fundamental challenges such as object occlusion, small-scale objects, background clutter, and abrupt camera motion. To tackle these difficulties, we propose to integrate the 3D structure of the observed scene into a detection-by-tracking algorithm. We introduce a pipeline that combines a model-free visual object tracker, a sparse 3D reconstruction, and a state estimator. The 3D reconstruction of the scene is computed with an image-based Structure-from-Motion (SfM) component that enables us to leverage a state estimator in the corresponding 3D scene during tracking. By representing the position of the target in 3D space rather than in image space, we stabilize the tracking during ego-motion and improve the handling of occlusions, background clutter, and small-scale objects. We evaluated our approach on prototypical image sequences, captured from a UAV with low-altitude oblique views. For this purpose, we adapted an existing dataset for visual object tracking and reconstructed the observed scene in 3D. The experimental results demonstrate that the proposed approach outperforms methods using plain visual cues as well as approaches leveraging image-space-based state estimations. We believe that our approach can be beneficial for trafficmonitoring, video surveillance, and navigation.
Highlights
In recent years, unmanned aerial vehicles (UAVs) have expanded in usage conjointly with the number of applications they provide, such as video surveillance, traffic monitoring, aerial photography, wildlife protection, cinematography, target following, disaster response, and even delivery
We use the following terms: (1) “original”, which refers to the unmodified visual object tracker ATOM and DiMP presented in [27,28]. (2) ”2D variant“, denoting ATOM and DiMP—i.e., ATOM-2D, DiMP-2D—coupled with a particle filter, working in the 2D image space
Besides the challenges arising from the specific characteristics for single visual object tracking from UAVs, the use of computer vision approaches onboard a UAV faces the problem of finding an adequate compromise between computational complexity and real-time capabilities with extreme resource limitations on the platform
Summary
In recent years, unmanned aerial vehicles (UAVs) have expanded in usage conjointly with the number of applications they provide, such as video surveillance, traffic monitoring, aerial photography, wildlife protection, cinematography, target following, disaster response, and even delivery. Used in the military field, their use has gradually become widespread in the civil and commercial field, allowing new applications to emerge, which incorporate or eventually will incorporate visual object tracking as a core component. Single visual object tracking is a long-studied computer vision problem relevant for many real-world applications. Its goal is to estimate the location of an object in an image sequence, given its initial location at the beginning. By integrating a state estimator in the tracking process, the tracking pipeline is referred to as detection-by-tracking; and without, as tracking-by-detection [1]. Despite solving challenging tasks to a certain extent—e.g., illumination changes, motion blur, scale variation—by using deep learning for visual object tracking, there are still situations that remain difficult to solve, e.g., partial and full occlusion, abrupt object motions, or background clutter
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.