Exploring the potential of Siamese network for RGBT object tracking

Liangliang Feng,Kechen Song,Junyi Wang,Yunhui Yan

doi:10.1016/j.jvcir.2023.103882

Abstract

Siamese tracking is one of the most promising object tracking methods today due to its balance of performance and speed. However, it still performs poorly when faced with some challenges such as low light or extreme weather. This is caused by the inherent limitations of visible images, and a common way to cope with it is to introduce infrared data as an aid to improve the robustness of tracking. However, most of the existing RGBT trackers are variants of MDNet (Hyeonseob Nam and Bohyung Han, Learning multi-domain convolutional neural networks for visual tracking, in: Proceedings of the IEEE conference on computer vision and pattern recognition, 2016, pp. 4293–4302.), which have significant limitations in terms of operational efficiency. On the contrary, the potential of Siamese tracking in the field of RGBT tracking has not been effectively exploited due to the reliance on large-scale training data. To solve this dilemma, in this paper, we propose an end-to-end Siamese RGBT tracking framework that is based on cross-modal feature enhancement and self-attention (SiamFEA). We draw on the idea of migration learning and employ local fine-tuning to reduce the dependence on large-scale RGBT data and verify the feasibility of this approach, and then we propose a reliable fusion approach to efficiently fuse the features of different modalities. Specifically, we first propose a cross-modal feature enhancement module to exploit the complementary properties of dual-modality, followed by capturing non-local attention in channel and spatial dimensions for adaptive weighted fusion, respectively. Our network was trained end-to-end on the LasHeR (Chenglong Li, Wanlin Xue, Yaqing Jia, Zhichen Qu, Bin Luo, Jin Tang, LasHeR: A Large-scale High-diversity Benchmark for RGBT Tracking, CoRR abs/2104.13202, 2021) training set and reached new SOTAs on GTOT (C. Li, H. Cheng, S. Hu, X. Liu, J. Tang, L. Lin, Learning collaborative sparse representation for grayscale-thermal tracking, IEEE Trans. Image Process, 25 (12) (2016) 5743–5756.), RGBT234 (C. Li, X. Liang, Y. Lu, N. Zhao, and J. Tang, “Rgb-t object tracking: Benchmark and baseline,” Pattern Recognition, vol. 96, p. 106977, 2019.), and LasHeR (Chenglong Li, Wanlin Xue, Yaqing Jia, Zhichen Qu, Bin Luo, Jin Tang, LasHeR: A Large-scale High-diversity Benchmark for RGBT Tracking, CoRR abs/2104.13202, 2021) while running in real-time.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Exploring the potential of Siamese network for RGBT object tracking

Abstract

Talk to us

Similar Papers

More From: Journal of Visual Communication and Image Representation

Lead the way for us

Journal: Journal of Visual Communication and Image Representation	Publication Date: Jun 22, 2023
Citations: 5

Similar Papers

LasHeR: A Large-Scale High-Diversity Benchmark for RGBT Tracking.
Chenglong Li ... Zhichen Qu
IEEE Transactions on Image Processing | VOL. 31
Chenglong Li, et. al.Chenglong Li ... Zhichen Qu
01 Jan 2021
IEEE Transactions on Image Processing | VOL. 31

Statistical conversion of speech parameter trajectory for mapping between features of different modalities
Tomoki Toda
The Journal of the Acoustical Society of America | VOL. 123
Tomoki TodaTomoki Toda
01 May 2008
The Journal of the Acoustical Society of America | VOL. 123

RGBT Tracking via Multi-Adapter Network with Hierarchical Divergence Loss.
Andong Lu ... Jin Tang
IEEE Transactions on Image Processing | VOL. 30
Andong Lu, et. al.Andong Lu ... Jin Tang
01 Jan 2020
IEEE Transactions on Image Processing | VOL. 30

RGB-T tracking by modality difference reduction and feature re-selection
Qiang Zhang ... Xueru Liu
Image and Vision Computing | VOL. 127
Qiang Zhang, et. al.Qiang Zhang ... Xueru Liu
01 Nov 2022
Image and Vision Computing | VOL. 127

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Exploring the potential of Siamese network for RGBT object tracking

Abstract

Talk to us

Similar Papers

More From: Journal of Visual Communication and Image Representation