Abstract

Visual object tracking is a basic task in the field of computer vision. Despite the rapid development of visual object tracking, it is not reliable to use only visible light images for object tracking in some cases. Since visible light and thermal infrared images have complementary advantages in imaging, and the use of them as a joint input for tracking becomes more noted, this kind of tracking is RGBT tracking. The existing RGBT tracking can be divided into image-level fusion tracking, feature-level fusion tracking, and response-level fusion tracking. Compared with the first two, response-level fusion tracking can use deeper dual-mode image information, but most of them use traditional tracking methods and introduce weights at inappropriate stages. Based on the above, we propose a response-level fusion tracking algorithm that employed deep learning. And the weight distribution is placed in the feature extraction stage, for which we design the joint modal channel attention module. We adopt the Siamese framework and expand it into a dual Siamese subnetwork. In the meantime, we improve the regional proposal subnetwork and propose the strategy for fusing two modal predicted position maps. To verify the performance of our algorithm, we conducted experiments on two tracking benchmarks. After testing, our algorithm has very good performance and runs at 116 frames per second, which far exceeds the real-time requirement of 25 frames per second.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.