Channel Exchanging for RGB-T Tracking.

Long Zhao,Honge Ren,Meng Zhu,Lingjixuan Xue

doi:10.3390/s21175800

Long Zhao, Honge Ren + Show 2 more

Open Access

https://doi.org/10.3390/s21175800

Copy DOI

Abstract

It is difficult to achieve all-weather visual object tracking in an open environment only utilizing single modality data input. Due to the complementarity of RGB and thermal infrared (TIR) data in various complex environments, a more robust object tracking framework can be obtained using video data of these two modalities. The fusion methods of RGB and TIR data are the core elements to determine the performance of the RGB-T object tracking method, and the existing RGB-T trackers have not solved this problem well. In order to solve the current low utilization of information intra single modality in aggregation-based methods and between two modalities in alignment-based methods, we used DiMP as the baseline tracker to design an RGB-T object tracking framework channel exchanging DiMP (CEDiMP) based on channel exchanging. CEDiMP achieves dynamic channel exchanging between sub-networks of different modes hardly adding any parameters during the feature fusion process. The expression ability of the deep features generated by our data fusion method based on channel exchanging is stronger. At the same time, in order to solve the poor generalization ability of the existing RGB-T object tracking methods and the poor ability in the long-term object tracking, more training of CEDiMP on the synthetic dataset LaSOT-RGBT is added. A large number of experiments demonstrate the effectiveness of the proposed model. CEDiMP achieves the best performance on two RGB-T object tracking benchmark datasets, GTOT and RGBT234, and performs outstandingly in the generalization testing.

Highlights

The object tracking method based on visible images has made much breakthrough in solving target state transition and similar objects interference in recent years, the performance of the tracker under specific environments decreases significantly, such as low illumination, strong light, rain, haze, etc
We have obtained the synthetic dataset LaSOT-RGBT, which can be used for RGB-T long-term tracking
We propose an RGB-T tracker channel exchanging DiMP (CEDiMP) based on bimodal data fusion by channel exchanging

Summary

Introduction

The object tracking method based on visible images has made much breakthrough in solving target state transition and similar objects interference in recent years, the performance of the tracker under specific environments decreases significantly, such as low illumination, strong light, rain, haze, etc. The main reason is that the quality of RGB images produced by the visible light camera is extremely poor [1] in the above environment. A thermal infrared camera can produce high-quality TIR images in the above environment. Thermal infrared cameras are not sensitive to light conditions and have a strong penetrating ability. They can capture infrared radiation of 0.75–13 μm wavelength from objects above absolute zero temperature and form the single-channel grayscale images of better quality [2]. We can clearly see the outline of people from the TIR image in Figure 1 (right), while the outline of people in the RGB image (left) is extremely fuzzy. We can clearly know the number of people from the TIR image in Figure 2 (right), while the number of people in the RGB image (left) cannot be seen clearly at all

Objectives

Methods

Results

Discussion

Conclusion