Visual object tracking is as a critical function for many computer vision tasks such as motion analysis, event detection and action recognition. Recently, Siamese network based trackers gained enormous popularity in the tracking field due to their favorable accuracy and efficiency. However, the distraction problem caused by semantic backgrounds and the simple modeling strategy of target templates often lead to performance degradation. In this study, we propose two modules, namely the target objectness model and the target template model, based on existing Siamese network based trackers to solve these issues. The target objectness model computes the possibility of each pixel in the search area pertaining to the tracked target based on color distributions of the foreground and background areas. The computed target likelihood map is masked on the previous response map, and subsequently adjusts the final response map to focus on the target. This practice enlarges the discrimination between the tracked target and surrounding backgrounds, thus alleviating the distraction problem. The target template model proposes a Gaussian mixed model to encode target appearance variations, where each component of the model represents a different aspect of the target, and the component weights are learned and dynamically updated. The proposed Gaussian model enhances diversity and simultaneously reduces redundancy between target samples. To validate the effectiveness of our proposed method, we perform extensive experiments on four widely used benchmarks, namely OTB100, VOT2016, TC128, and UAV123. The experimental results demonstrate that our proposed algorithm achieves favorable performance compared to many state-of-the-art trackers while maintaining real-time tracking speed.
Read full abstract