Abstract

Generic visual object tracking is challenging due to various difficulties, e.g. scale variations and deformations. To solve those problems, we propose a novel multi-scale selective kernel module for tracking, which contains small-scale and large-scale branches to model the target at different scales and attention mechanism to capture the more effective appearance information of the target. In our module, we cascade multiple small-scale convolutional blocks as an equivalent large-scale branch to extract large-scale features of the target effectively. Besides, we present a hybrid strategy for feature selection to extract significant information from features of different scales. Based on the current excellent segmentation tracking framework, we propose a novel tracking network that leverages our module at multiple places in the up-sample phase to construct a more accurate and robust appearance model. Extensive experimental results show that our tracker outperforms other state-of-the-art trackers on multiple challenging benchmarks including VOT2018, TrackingNet, DAVIS-2017, and YouTube-VOS-2018 while achieves real-time tracking.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call