Abstract

Convolution Neural Network (CNN) features have been widely used in visual tracking due to their powerful representation. As an important component of CNN, the pooling layer plays a critical role, but the max/average/min operation only explores the first-order information, which limits the discrimination ability of the CNN features in some complex situations. In this paper, a high-order pooling layer is integrated into the VGG16 network for visual tracking. In detail, a high-order covariance pooling layer is employed to replace the last maxpooling layer to learn discrimination features and is trained on the ImageNet and CUB200-2011 data sets. In tracking stage, the multiple levels of feature maps are extracted as the appearance representation of the target. After that, the extracted CNN features are integrated into the correlation filters framework when tracking is on-the-fly. The experimental results show that the proposed algorithm achieves excellent performance in both success rate and tracking accuracy.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.