Abstract

RGB-T object tracking is a branch of visual tracking that has been widely applied to many fields, such as intelligent transportation and urban monitoring. Due to the interference of background clutter and occlusion, the existing trackers still suffer from the problems of unreasonable modal fusion strategy, insufficient feature extraction, and loss of semantic information. To solve these problems, we propose a residual learning-based two stream network for RGB-T object tracking. The overall feature extraction network is composed of three branches, and multi-layer convolutions are utilized to extract the features of visible, thermal, and fused images, respectively. First, aiming at improving the effectiveness of feature extraction, a weight generation module for hierarchical feature weight calculation is designed to guide the direction of feature fusion. Then, the residual block is employed to replace the single-layer convolution in order to increase the depth of the network, by which deeper semantic features are learned and the loss of semantic information is alleviated. Finally, a loss function with a penalty term is developed to adjust our network toward the direction of the best tracking performance of the dual modalities. This overcomes the negative impact of poor mode on model training. Experiments implemented on public RGB-T datasets indicate that our algorithm outperforms the recent state-of-the-art trackers in terms of both precision rate and success rate. Compared with the second-best comparison algorithm, our tracker improves the above two metrics by 0.4% and 1.5%, respectively. Our codes are available at https://github.com/MinjieWan/Residual-learning-based-two-stream-network-for-RGB-T-object-tracking.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call