Abstract

Sliding window and candidate sampling are two widely used search strategies for visual object tracking, but they are far behind real-time. By treating the tracking problem as a three-step decision-making process, a novel tracking network, which explores only three small subsets of candidate regions, is developed to achieve faster (real-time) localization of the target object along the frames in a video. A convolutional neural network agent is formulated to interact with a video over time, and two action-value functions are exploited to learn a favorable policy off-line to determine the best action for visual object tracking. Our model is trained in a collaborative learning way by using action classification and cumulative reward approximation in reinforcement learning. We have evaluated our proposed tracker against a number of state-of-the-art ones over three popular tracking benchmarks including OTB-2013, OTB-2015, and VOT2017. The experimental results have demonstrated that our proposed method can achieve very competitive performance on real-time object tracking.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call