Visual object tracking is to locate an object of interest in a sequence of consecutive video frames, which is widely applied in many high-level computer vision tasks such as intelligent video surveillance and robotics. It is of great challenges for visual tracking methods to handle large target appearance variations caused by pose deformation, fast motion, occlusion, and surrounding environments in real-time videos. In this paper, inspired by human attention cognitive saliency model, we propose a visual tracking method based on salient superpixels which integrates the target appearance similarity and cognitive saliency, and helps to location inference and appearance model updating. The saliency of superpixel is detected by graph model and manifold ranking. We cluster the superpixels of the first four target boxes into a set corresponding to object foreground and model the target appearance with color descriptors. While tracking, the relevance is computed between the candidate superpixels and the target appearance set. We also propose an iterative threshold segmentation method to distinguish the foreground and background of superpixels based on saliency and relevance. To increase the accuracy of location inference, we explore particle filter in both confidence estimation and sampling procedures. We compared our method with the existing techniques in OTB100 dataset in terms of precision based on center location error and success rate based on overlap, and the experimental results show that our proposed method achieved substantially better performance. Promising results have shown that the proposed salient superpixel-based approach is effective to deformation, occlusion, and other challenges in object tracking.
Read full abstract