Abstract

This paper presents a long-term object tracking framework with a moving event camera under general tracking conditions. A first of its kind for these revolutionary cameras, the tracking framework uses a discriminative representation for the object with online learning, and detects and re-tracks the object when it comes back into the field-of-view. One of the key novelties is the use of an event-based <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">local sliding window</i> technique that tracks reliably in scenes with cluttered and textured background. In addition, Bayesian bootstrapping is used to assist real-time processing and boost the discriminative power of the object representation. On the other hand, when the object re-enters the field-of-view of the camera, a <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">data-driven, global sliding window</i> detector locates the object for subsequent tracking. Extensive experiments demonstrate the ability of the proposed framework to track and detect arbitrary objects of various shapes and sizes, including dynamic objects such as a human. This is a significant improvement compared to earlier works that simply track objects as long as they are visible under simpler background settings. Using the ground truth locations for five different objects under three motion settings, namely translation, rotation and 6-DOF, quantitative measurement is reported for the event-based tracking framework with critical insights on various performance issues. Finally, real-time implementation in C++ highlights tracking ability under scale, rotation, view-point and occlusion scenarios in a lab setting.

Highlights

  • S TANDARD video cameras struggle to capture crisp images of scenes characterized by high dynamic range and motion, returning blurred or saturated images

  • This paper introduces a simple and efficient object tracking framework, consisting of a local tracker and a global detector, by taking advantage of the sparsity and higher temporal resolution of the event camera

  • In all three motion cases, event-based trackinglearning-detection (e-TLD) comprehensively outperforms event-based long-term object tracking (e-LOT) using the average overlap success (OS) score while performing slightly underpar for the ‘cup’ object. We attribute this anomaly to the tailor-made e-LOT system for tracking objects with cleaner background, which the ‘cup’ object encounters due to its unique placement in the scene compared to the other objects

Read more

Summary

INTRODUCTION

S TANDARD video cameras struggle to capture crisp images of scenes characterized by high dynamic range and motion, returning blurred or saturated images. To introduce a general purpose method to track object data from event cameras, which can be efficiently implemented in software at least, in contrast to the ever-growing neural network paradigms that potentially require hours of re-training for online learning. Apart from the online learning ability of e-TLD, the core training process is the codebook learning step that requires under a minute for 500ms worth of data on a standard PC using efficient sampling strategies [8] This requires significantly lower resources in contrast to Siamese deep neural network object tracking paradigms [9] that require ASIC implementations for real-time inference on each video frame. 2) Re-capturing the target after temporarily occluded by other objects or when it re-appears after exiting

EVENT-BASED PROCESSING
Related Work
Contribution
METHODOLOGY
Event-Based Object Tracker
Event-Based Object Detector
12: Choose highest activation in O to re-initialize Bt
EXPERIMENTS
Parameters
Results on Event Camera Dataset
Real-Time Testing
Findings
DISCUSSION
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call