Abstract

For problems related to the robust tracking of visual objects in various challenging tracking conditions, a robust visual tracking method based on multilayer convolutional features and correlation filtering is proposed. To solve the problems of mean deviation and insufficient discrimination ability in traditional convolutional neural networks (CNN), this study proposes randomized parametric rectified linear units (RPReLU) as the activation function. Meanwhile, the zero-setting operation of weights in the traditional dropout process occurs randomly and fails to discriminate the features with different weights, which leads to a low learning efficiency. Therefore, this study proposes an improved dropout method based on a support vector machine (SVM), which provides a selective dropout rate to increase the manual orientation and improve the learning efficiency of the dropout process. In addition, traditional CNN trackers only employ the output of the last layer, which can effectively capture semantic features but not spatial features. To solve this problem, we propose to use the rich features of the multiple convolution layers of CaffeNet as the target representation. Furthermore, we propose an improved correlation filter to further improve the tracking performance and improve the tracker's capability of dealing with scale changes, which effectively solves the problem of adaptive estimating of target size. The extensive experimental evaluations have been carried out through the OTB2015, VOT2016 and VOT2018 datasets, proving that the proposed method is very effective in dealing with a variety of challenging factors.

Highlights

  • Visual target tracking is a valuable research that has been widely used in frontier fields such as traffic accident supervision, automatic driving, intelligent home, and weapon control [1]–[4]

  • CONTRIBUTIONS This study proposes a robust visual tracker based on multilayer convolutional features of convolutional neural networks (CNN) and correlation filtering

  • As shown in Table.1, AlexNet with PReLU reduces the validation error from 41.14% to 40.83% compared with that of AlexNet with ReLU, while the randomized parametric rectified linear units (RPReLU) activation function further decreases the error from 40.83% to 39.45%

Read more

Summary

INTRODUCTION

Visual target tracking is a valuable research that has been widely used in frontier fields such as traffic accident supervision, automatic driving, intelligent home, and weapon control [1]–[4]. Hao et al [24] developed a new tracking algorithm based on CNN, which decomposes the tracking process into translation and scale estimation This algorithm learns multiple correlation filters on CNN features and adaptively fuses these response graphs to obtain better target positions. Considering that traditional CaffeNet is a deep convolution neural network model, and CaffeNet has a simpler and more efficient network structure, we propose to employ multilayer features of CaffeNet as the target representation to solve these problems. 4) Correlation filter (CF) is introduced to our CNN tracker to further enhance the tracking performance and improve the tracker’s ability to deal with the scale change and effectively solve the problem of adaptive estimation of target size.

RPReLU ACTIVATION FUNCTION
EXPERIMENTAL ANALYSIS
Motion variation
Low resolution
Rotation
Occlusion
Findings
CONCLUSION
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.