Abstract
RGB-T tracking has been widely applied in various fields such as robotics, surveillance processing, and autonomous driving. In contrast, the development of RGB-D tracking remains relatively slow. Existing RGB-D trackers still adopt the paradigm of extracting spatial information between the target template and the search region for appearance feature matching. However, this paradigm cannot capture target appearance variations in traditional RGB tracking, let alone in more complex multimodal object tracking datasets. To enhance the performance of RGB-D trackers, we propose a novel temporal adaptive bidirectional bridging framework for RGB-D tracking named TABBTrack. TABBTrack employs a three-stream architecture with temporal features and uses a bidirectional bridging module to iteratively bridge RGB and Depth modality information, fully achieving cross-modal feature interaction. In addition, we also achieve temporal information update with low-complexity. Extensive experiments on four popular RGB-D tracking benchmarks demonstrate that our method achieves state-of-the-art performance while running at real-time speeds.
Published Version
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have