Abstract

ABSTRACT To solve the problems of the complex multi-scale searching and the low correlation between classification confidence and location accuracy in the visual tracking process, we propose a novel visual tracking framework named ISiamCRAN. ISiamCRAN uses modified ResNet-50 as the backbone network to extract depth features. Then, these extracted features are fed into an improved classification-regression adaptive head for depth feature cross-correlation operation. Different from existing trackers, in order to remove low-quality prediction bounding boxes, we integrate a quality assessment branch to the classification-regression adaptive head. Moreover, we use an elliptical sample label assignment strategy to replace traditional strategy, by this way our tracker can more accurately distinguish the foreground and background. Finally, for these feature response maps obtained by depth feature cross-correlation operation, the position and scale of the target are predicted directly in a unified fully convolutional network by an anchor-free manner. Extensive experiments on OTB100 and VOT2018 benchmarks indicate that our ISiamCRAN outperforms other traditional trackers, and is robust to motion blur and scale variation. Our tracker runs at approximatively 35 fps on GPU.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.