Abstract

Visual tracking based on deep learning technique is a very attractive research topic recently in the computer vision field. Deep convolutional neural networks (CNNs) are inherently limited to low spatial resolution, due to the max pooling process in the modules, and they are constrained by the high computation burden. We present a pretrained deep learning network architecture to the task of visual tracking, by introducing a wavelet representation in the network and a two-stage fine-tuning for learning appearance features, which improves the original deep learning tracker. Moreover, a loss layer based on Bayesian theorem is adopted to compute maximum classifier score, instead of the softmax loss layer, which can enhance the success rate. In addition, the idea of wavelet pooling helps perform feature dimension reduction. In addition, wavelet representation helps to reduce the computation time greatly. Compared with the original algorithm and other state-of-the-art methods, the proposed tracking method shows excellent performances on test baseline dataset. As our optimized spectrum CNN can extract a compact and efficient representation of objects, it can be further applied to multiple objects tracking.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call