Abstract

We address the problem of multi-modal object tracking in video and explore various options available for fusing the complementary information conveyed by the visible (RGB) and thermal infrared (TIR) modalities, including pixel-level, feature-level and decision-level fusion. Specifically, in contrast to the existing approaches, we propose and develop the paradigm for combining multi-modal information for image fusion at pixel level. At the feature level, two different kinds of fusion strategies are investigated for completeness, i.e., the attention-based online fusion strategy and the offline-trained fusion block. At the decision level, a novel fusion strategy is put forward, inspired by the success of the simple averaging configuration which has shown so much promise. The effectiveness of the proposed decision-level fusion strategy owes to a number of innovative contributions, including a dynamic weighting of the RGB and TIR contributions and a linear template update operation. A variant of the proposed decision fusion method produced the winning tracker at the Visual Object Tracking Challenge 2020 (VOT-RGBT2020). A comprehensive comparison of the innovative pixel and feature-level fusion strategies with the proposed decision-level fusion method highlights the advantages fusing multimodal information at the decision score level. Extensive experimental results on five challenging datasets, i.e., GTOT, VOT-RGBT2019, RGBT234, LasHeR and VOT-RGBT2020, demonstrate the effectiveness and robustness of the proposed method, compared to the state-of-the-art approaches. The Code is available at https://github.com/Zhangyong-Tang/DFAT.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.