Abstract

The state-of-the-art trackers using deep learning technology have no special strategy to capture the geometric deformation of the target. Based on that the affine manifold can better capture the target shape change and that the higher level of Convolutional Neural Network (CNN) can better describe semantic information of objects, we propose a new tracking algorithm combining affine transformation with convolutional features to track targets with dramatic deformation. First, the affine transformation is applied to predict possible locations of a target, then a correlative filter is designed to compute the appearance confidence score for determining the final target location. Furthermore, a standard discriminative correlation filter is used to develop the effect of convolutional features, which is more efficient than other methods used for CNN Networks. Comprehensive experiments demonstrate the outstanding performance of our tracking algorithm compared to the state-of-the-art techniques in the public benchmarks.

Highlights

  • Visual object tracking is one of the fundamental tasks in computer vision with various applications from missile guidance and computer vision to autonomous driving.The deformation modeling of the target is the key to obtain stable tracking result

  • Considering that affine manifold can better describe the geometric deformation of the target, the deformation models of visual tracking algorithms are largely built on the affine group

  • The results tell us that the proposed method outperforms other four state-of-the-art methods in all three evaluation metrics: spatial robustness evaluation (SRE), OPE and temporal robustness evaluation (TRE)

Read more

Summary

INTRODUCTION

Visual object tracking is one of the fundamental tasks in computer vision with various applications from missile guidance and computer vision to autonomous driving. Reference [2], [3] uses affine transformation to depict the deformation process of the target, and proposes an target tracking algorithm by using Riemannian Manifold geometry structure. Convolutional filter has been widely used for visual tracking due to its high computational efficiency in Fourier domain These kinds of tracking methods [18], [29] don’t need hard-threshold samples of target appearance because they regress all the circular-shifted versions of input features to a Gaussian function. For dealing with the above issues, we apply affine manifold to capture target geometric transformation and the output of the highest layer of CNN network to describe the semantic information of target appearance in building a new tracker. (3) By the combination of affine transformation and the last convolutional layer of correlative filters, both semantics and geometric deformation are simultaneously applied to handle large appearance and geometrical variations without drifting. For more knowledge of the Lie group, refer to the reference [40], [41]

DESIGN THE GEOMETRIC TRANSFORMATION MODEL
TRACKING WITH CNN
DETAILS AND EXPERIMENTAL EVALUATION
Findings
CONCLUDING REMARKS
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call