Studies have shown that contextual information can promote the robustness of trackers. However, trackers based on convolutional neural networks (CNNs) only capture local features, which limits their performance. We propose a novel relevant context block (RCB), which employs graph convolutional networks to capture the relevant context. In particular, it selects the <inline-formula xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink"><tex-math notation="LaTeX">$k$</tex-math></inline-formula> largest contributors as nodes for each query position (unit) that contain meaningful and discriminative contextual information and updates the nodes by aggregating the differences between the query position and its contributors. This operation can be easily incorporated into the existing networks and can be easily end-to-end trained using a standard backpropagation algorithm. To verify the effectiveness of RCB, we apply it to two trackers, SiamFC and GlobalTrack, respectively, and the two improved trackers are referred to as Siam-RCB and GlobalTrack-RCB. Extensive experiments on OTB, VOT, UAV123, LaSOT, TrackingNet, OxUvA, and VOT2018LT show the superiority of our method. For example, our Siam-RCB outperforms SiamFC by a very large margin (up to 11.2% in the success score and 7.8% in the precision score) on the OTB-100 benchmark.
Read full abstract