Abstract
The Siamese architecture has shown remarkable performance in the field of visual tracking. Although the existing Siamese-based tracking methods have achieved a relative balance between accuracy and speed, the performance of many trackers in complex scenes is often unsatisfactory, which is mainly caused by interference factors, such as target scale changes, occlusion, and fast movement. In these cases, excessive trackers cannot employ sufficiently the target feature information and face the dilemma of information loss. In this work, we propose a novel parallel Transformer network architecture to achieve robust visual tracking. The proposed method designs the Transformer-1 module, the Transformer-2 module, and the feature fusion head (FFH) based on the attention mechanism. The Transformer-1 module and the Transformer-2 module are regarded as corresponding complementary branches in the parallel architecture. The FFH is used to integrate the feature information of the two parallel branches, which can efficiently exploit the feature dependence relationship between the template and the search region, and comprehensively explore rich contextual information. Finally, by combining the core ideas of Siamese and Transformer, we present a simple and robust tracking framework called RPformer, which does not require any prior knowledge and avoids the trouble of adjusting hyperparameters. Numerous experiments show that the proposed tracking method achieves more outstanding performance than the state-of-the-art trackers on seven tracking benchmarks, which can meet the real-time requirements at a running speed exceeding 50.0 frames/s.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
More From: IEEE Transactions on Instrumentation and Measurement
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.