Abstract
CNN-based trackers, especially those based on Siamese networks, have recently attracted considerable attention because of their relatively good performance and low computational cost. For many Siamese trackers, learning a generic object model from a large-scale dataset is still a challenging task. In the current study, we introduce input noise as regularization in the training data to improve generalization of the learned model. We propose an Input-Regularized Channel Attentional Siamese (IRCA-Siam) tracker which exhibits improved generalization compared to the current state-of-the-art trackers. In particular, we exploit offline learning by introducing additive noise for input data augmentation to mitigate the overfitting problem. We propose feature fusion from noisy and clean input channels which improves the target localization. Channel attention integrated with our framework helps finding more useful target features resulting in further performance improvement. Our proposed IRCA-Siam enhances the discrimination of the tracker/background and improves fault tolerance and generalization. An extensive experimental evaluation on six benchmark datasets including OTB2013, OTB2015, TC128, UAV123, VOT2016 and VOT2017 demonstrate superior performance of the proposed IRCA-Siam tracker compared to the 30 existing state-of-the-art trackers.
Highlights
Visual Object Tracking (VOT) is a promising and fundamental research area in computer vision applications including robotics [1], video understanding [2], video surveillance [3] and autonomous driving [4]
Despite a significant progress made in the field of VOT, it remains a challenging problem owing to diverse real-world challenges such as scale variations, occlusion, background clutter, fast motion, and illumination variations
This benchmark contains 123 videos captured from an Unmanned Aerial Vehicle (UAV) at a low-altitude
Summary
Visual Object Tracking (VOT) is a promising and fundamental research area in computer vision applications including robotics [1], video understanding [2], video surveillance [3] and autonomous driving [4]. Deep trackers take the benefits from pretrained deep neural networks and have shown outstanding performance [5,6,7,8,9,10]. These deep trackers extract features from off-the-shelf pretrained models as a backbone feature extractor known as deep features for better discrimination. The pretrained models are trained over ImageNet [11] for image classification tasks such as VGGNet and AlexNet
Published Version (Free)
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have