Abstract
This paper presents a long-term object tracking framework with a moving event camera under general tracking conditions. A first of its kind for these revolutionary cameras, the tracking framework uses a discriminative representation for the object with online learning, and detects and re-tracks the object when it comes back into the field-of-view. One of the key novelties is the use of an event-based <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">local sliding window</i> technique that tracks reliably in scenes with cluttered and textured background. In addition, Bayesian bootstrapping is used to assist real-time processing and boost the discriminative power of the object representation. On the other hand, when the object re-enters the field-of-view of the camera, a <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">data-driven, global sliding window</i> detector locates the object for subsequent tracking. Extensive experiments demonstrate the ability of the proposed framework to track and detect arbitrary objects of various shapes and sizes, including dynamic objects such as a human. This is a significant improvement compared to earlier works that simply track objects as long as they are visible under simpler background settings. Using the ground truth locations for five different objects under three motion settings, namely translation, rotation and 6-DOF, quantitative measurement is reported for the event-based tracking framework with critical insights on various performance issues. Finally, real-time implementation in C++ highlights tracking ability under scale, rotation, view-point and occlusion scenarios in a lab setting.
Highlights
S TANDARD video cameras struggle to capture crisp images of scenes characterized by high dynamic range and motion, returning blurred or saturated images
This paper introduces a simple and efficient object tracking framework, consisting of a local tracker and a global detector, by taking advantage of the sparsity and higher temporal resolution of the event camera
In all three motion cases, event-based trackinglearning-detection (e-TLD) comprehensively outperforms event-based long-term object tracking (e-LOT) using the average overlap success (OS) score while performing slightly underpar for the ‘cup’ object. We attribute this anomaly to the tailor-made e-LOT system for tracking objects with cleaner background, which the ‘cup’ object encounters due to its unique placement in the scene compared to the other objects
Summary
S TANDARD video cameras struggle to capture crisp images of scenes characterized by high dynamic range and motion, returning blurred or saturated images. To introduce a general purpose method to track object data from event cameras, which can be efficiently implemented in software at least, in contrast to the ever-growing neural network paradigms that potentially require hours of re-training for online learning. Apart from the online learning ability of e-TLD, the core training process is the codebook learning step that requires under a minute for 500ms worth of data on a standard PC using efficient sampling strategies [8] This requires significantly lower resources in contrast to Siamese deep neural network object tracking paradigms [9] that require ASIC implementations for real-time inference on each video frame. 2) Re-capturing the target after temporarily occluded by other objects or when it re-appears after exiting
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
More From: IEEE Transactions on Circuits and Systems for Video Technology
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.