Due to environmental conditions, such as rainy days, foggy days, and dim lighting, objects in visible light images are not prominently displayed, leading to an easy loss of targets during tracking. In recent years, many RGB visible light trackers have achieved significant success in addressing visual tracking challenges. However, these trackers perform poorly when tracking targets under special conditions, such as occlusions and low-light scenarios. In contrast, objects in thermal infrared images are more distinct in poor lighting conditions. Given this characteristic, researchers have shown increased interest in the development of trackers that combine thermal infrared and visible light imagery. However, some mainstream RGBT (red–green–blue and thermal) algorithms, such as MANET and ADNET, are based on the anchor-based theory, requiring consideration of anchor box sizes and introducing a substantial number of hyperparameters. This can lead to suboptimal performance when tracking dynamically changing targets. Moreover, these models rely on convolutional neural networks for feature extraction, which have limitations in capturing global features. In this paper, we introduce a novel training network model called DAPAT, which combines the anchor-free concept with Transformer theory. DAPAT differs from previous models in several ways. Specifically, we have designed a straightforward model to extract precise global features from template and search images. We have also incorporated two enhancement modules into the model to improve template and search images of different sizes while suppressing the impact of non-target images. We employ a dual-stream feature fusion network to reduce the loss of image feature information due to feature correlation operations. Finally, we compare the performance of the tracking model proposed in this paper with some advanced RGBT trackers on three data sets (RGBT234, RGBT210, and GTOT). The test results demonstrate that our tracker exhibits improvements in robustness and success rate, among other performance aspects.
Read full abstract