Abstract

Object tracking is a fundamental task in computer vision and artificial intelligence. However, state-of-the-art object tracking approaches are still prone to failures and are imprecise when applied to challenging scenarios, and their results are generally confidence agnostic. An imprecise deterministic output with low confidence may lead to disastrous consequences and a lack of proof for subsequent operations and human interventions. Deep network training with ambiguous data or the noise inherent in observations (i.e., data uncertainty or aleatoric uncertainty) will result in inherent uncertainties in predictions. In this paper, we exploit probabilistic depth-aware object tracking with a conditional variational autoencoder (CVAE). First, we build a bridge between the Siamese network and the variational autoencoder conditioned with depth images and propose a novel multimodal Bayesian object tracking method. Second, our proposed method yields a complete probability distribution that enables the production of multiple plausible features. Third, the variational autoencoder conditioned by depth images encodes a low-dimensional latent space that conducts depth-aware tracking, which has obvious advantages for challenging tracking scenarios. Our proposed tracking method outperformed the state-of-the-art trackers on the VOT 2016, VOT 2018, and VOT 2019 datasets.

Highlights

  • In the past ten years, deep learning methods have gradually become typical algorithm architectures in computer vision and artificial intelligence due to their great performance improvements in accuracy

  • 1) We build a bridge between the Siamese network and the variational autoencoder conditioned with depth images and propose a novel multimodal Bayesian object tracking method

  • In this paper, by focusing on exploiting probabilistic depth-aware Siamese object tracking with a conditional variational autoencoder (CVAE), we integrate uncertainty learning into object tracking and develop a tracking network to a Bayesian neural network to generate a complete probability distribution

Read more

Summary

Introduction

In the past ten years, deep learning methods have gradually become typical algorithm architectures in computer vision and artificial intelligence due to their great performance improvements in accuracy. As a fundamental branch of computer vision, is promoted by deep learning. Deep learning defeats many traditional machine learning methods (e.g., correlation filters and support vector machines) and dominates the field of object tracking. Large amounts of existing object tracking methods are still prone to failures. When applied to challenging scenarios, and their results are generally confidence agnostic. Deterministic features or regressions, provided by these methods based on an embedded deep learning model (e.g., a convolutional neural network), are without uncertainty predictions. For example, in regressions, some related works in the past often mistakenly consider the maximum regression responses at the end of the model pipeline as confidence or uncertainty; even if a model generates high regression prediction responses, it is proven to be uncertain [1]

Methods
Results
Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.