Abstract

The methods combining correlation filters (CFs) with the features of convolutional neural network (CNN) are good at object tracking. However, the high-level features of a typical CNN without residual structure suffer from the shortage of fine-grained information, it is easily affected by similar objects or background noise. Meanwhile, CF-based methods usually update filters at every frame even when occlusion occurs, which degrades the capability of discriminating the target from background. A novel scale-adaptive object-tracking method is proposed in this paper. Firstly, the features are extracted from different layers of ResNet to produce response maps, and then, in order to locate the target more accurately, these response maps are fused based on AdaBoost algorithm. Secondly, to prevent the filters from updating when occlusion occurs, an update strategy with occlusion detection is proposed. Finally, a scale filter is used to estimate the target scale. The experimental results demonstrate that the proposed method performs favorably compared with several mainstream methods especially in the case of occlusion and scale change.

Highlights

  • Video surveillance is significant for public security [1], while object tracking is the key technology of video surveillance [2, 3]

  • 5.1 Quantitative evaluation The proposed method is compared with seven mainstream algorithms including multiple experts using entropy minimization (MEEM) [18], circulant structure and the kernel method (CSK) [29], kernel correlation filter (KCF) [30], discriminative scale space tracking (DSST) [31], scale adaption with multiple features (SAMF) [32], hierarchical convolutional features (HCF) [41], CFNet [47], and discriminative CFs network (DCFNet) [46]

  • KCF, DSST, SAMF, and CSK use the correlation filters based on the hand-crafted features

Read more

Summary

Introduction

Video surveillance is significant for public security [1], while object tracking is the key technology of video surveillance [2, 3]. Object tracking has many practical applications in video surveillance, human-computer interaction and automatic driving [4,5,6]. Object tracking aims to estimate the target position in a video sequence by giving an initial position of the target. CNN pre-trained for image classification, such as AlexNet [9] and VGG[10], are used to extract target features in most deep-learning-based trackers. Those methods have high computational complexity as they need to extract the features of positive and negative samples.

Methods
Results
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.