Abstract

Appearance representation and the observation model are the most important components in designing a robust visual tracking algorithm for video-based sensors. Additionally, the exemplar-based linear discriminant analysis (ELDA) model has shown good performance in object tracking. Based on that, we improve the ELDA tracking algorithm by deep convolutional neural network (CNN) features and adaptive model update. Deep CNN features have been successfully used in various computer vision tasks. Extracting CNN features on all of the candidate windows is time consuming. To address this problem, a two-step CNN feature extraction method is proposed by separately computing convolutional layers and fully-connected layers. Due to the strong discriminative ability of CNN features and the exemplar-based model, we update both object and background models to improve their adaptivity and to deal with the tradeoff between discriminative ability and adaptivity. An object updating method is proposed to select the “good” models (detectors), which are quite discriminative and uncorrelated to other selected models. Meanwhile, we build the background model as a Gaussian mixture model (GMM) to adapt to complex scenes, which is initialized offline and updated online. The proposed tracker is evaluated on a benchmark dataset of 50 video sequences with various challenges. It achieves the best overall performance among the compared state-of-the-art trackers, which demonstrates the effectiveness and robustness of our tracking algorithm.

Highlights

  • Visual tracking is a critical technique to many applications [1,2,3], such as surveillance [4,5], robot vision [6], etc

  • There are four main contributions: (1) we introduce convolutional neural network (CNN) features into the visual tracking tasks, without training a deep network; (2) we proposed a two-step CNN feature extraction method to speed up the algorithm; (3) a new strategy is proposed to update object models according to discriminative ability and correlation; (4) Gaussian mixture model (GMM) is used to build the background model, to improve the adaptivity in the complex scene

  • In our two-step CNN extraction process, the size of input image is no smaller than 224 × 224; the size of conv5 feature maps is related to the input image; and the size of sliding windows on conv5 is 13 × 13, each window products of an fc7 feature vector of 4096 in dimension

Read more

Summary

Introduction

Visual tracking is a critical technique to many applications [1,2,3], such as surveillance [4,5], robot vision [6], etc. Tracking-by-detection has become an attractive tracking approach [7], which treats tracking as a category detection problem and trains a detector to separate the object from the background In this class of tracking methods, appearance representation and the observation model (classifier) play important roles, as in detection. CNN features are introduced into the exemplar-based tracking method for appearance representation and sped up by separately computing convolutional layers and fully-connected layers. We propose a method to update object models by selecting the detectors with strong discriminative ability and uncorrelated to other selected detectors; and build the background as a Gaussian mixture model (GMM) to cover the complex variations of the scenes

C NN featur es fc7
Exemplar-Based Tracker
Appearance Representations in Tracking-By-Detection Methods
Discriminative Models
Generative Model
Deep Networks in Tracking
Differences with ELDA
ELDA Tracker
Appearance Representations
Object Model Update
Background Model Update
Experimental Results
Implementation Details
Overall Performance
Quantitative Comparison
Evaluation of Components
Conclusions
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call