Abstract
While RGB-based trackers have made impressive progress, they still falter in complex scenarios, necessitating the exploration of multi-modal tracking strategies that leverage auxiliary modalities. However, most existing methods lack sufficient exploration and interaction of complementary information within and between modalities. To address this, we propose a Modality Interaction Network (MINet), a unified framework for multi-modal tracking. It consists of a Modality Representation Module (MRM) and a Memory Query Module (MQM). MRM enforces communications between different modalities by a designed Modality Interaction module (MIM) and fuses multi-modal information by a Modality Fuse Module (MFM) to generate more discriminative representation. MQM maintains historical multi-modal information and builds long-range dependencies between current and historical targets for tracking, which enhances the tracking performance, especially when targets undergo significant deformation and occlusions. To verify efficiency across different multi-modal tracking paradigms, we conduct extensive experiments, including RGB-D, RGB-T, and RGB-E. The experimental results demonstrate that in these multi-modal tracking tasks, the proposed MINet achieves outstanding performance compared to state-of-the-art trackers. Specifically, it outperforms them by 1% in RGB-D, 1.2% in RGB-T, and 1% in RGB-E tracking performance, respectively.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Similar Papers
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.