Exploring Multi-modal Spatial-Temporal Contexts for High-performance RGB-T Tracking.

Tianlu Zhang,Qiang Jiao,Qiang Zhang,Jungong Han

doi:10.1109/tip.2024.3428316

Abstract

In RGB-T tracking, there exist rich spatial relationships between the target and backgrounds within multi-modal data as well as sound consistencies of spatial relationships among successive frames, which are crucial for boosting the tracking performance. However, most existing RGB-T trackers overlook such multi-modal spatial relationships and temporal consistencies within RGB-T videos, hindering them from robust tracking and practical applications in complex scenarios. In this paper, we propose a novel Multi-modal Spatial-Temporal Context (MMSTC) network for RGB-T tracking, which employs a Transformer architecture for the construction of reliable multi-modal spatial context information and the effective propagation of temporal context information. Specifically, a Multi-modal Transformer Encoder (MMTE) is designed to achieve the encoding of reliable multi-modal spatial contexts as well as the fusion of multi-modal features. Furthermore, a Quality-aware Transformer Decoder (QATD) is proposed to effectively propagate the tracking cues from historical frames to the current frame, which facilitates the object searching process. Moreover, the proposed MMSTC network can be easily extended to various tracking frameworks. New state-of-the-art results on five prevalent RGB-T tracking benchmarks demonstrate the superiorities of our proposed trackers over existing ones.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Exploring Multi-modal Spatial-Temporal Contexts for High-performance RGB-T Tracking.

Abstract

Talk to us

Similar Papers

More From: IEEE transactions on image processing : a publication of the IEEE Signal Processing Society

Lead the way for us

Similar Papers

Multi-Adapter RGBT Tracking
Cheng Long Li ... Andong Lu
-
Cheng Long Li, et. al.Cheng Long Li ... Andong Lu
01 Oct 2019
01 Oct 2019

CIRNet: An improved RGBT tracking via cross-modality interaction and re-identification
Weidai Xia ... Ruichao Hou
Neurocomputing | VOL. 493
Weidai Xia, et. al.Weidai Xia ... Ruichao Hou
07 Apr 2022
Neurocomputing | VOL. 493

Multi-modal multi-task feature fusion for RGBT tracking
Yujue Cai ... Guohua Gu
Information Fusion | VOL. 97
Yujue Cai, et. al.Yujue Cai ... Guohua Gu
05 May 2023
Information Fusion | VOL. 97

RGBT Tracking by Fully-Convolutional Triple Networks with Cosine Embedding Loss
Ping Zhang ... Changke Wu
-
Ping Zhang, et. al.Ping Zhang ... Changke Wu
14 Jan 2022
14 Jan 2022

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Exploring Multi-modal Spatial-Temporal Contexts for High-performance RGB-T Tracking.

Abstract

Talk to us

Similar Papers

More From: IEEE transactions on image processing : a publication of the IEEE Signal Processing Society