Abstract

Efficient Convolution Operator (ECO) algorithms have achieved impressive performances in visual tracking. However, its feature extraction network of ECO is unconducive for capturing the correlation features of occluded and blurred targets between long-range complex scene frames. More so, its fixed weight fusion strategy does not use the complementary properties of deep and shallow features. In this paper, we propose a new target tracking method, namely ECO++, using deep feature adaptive fusion in a complex scene, in the following two aspects: First, we constructed a new temporal convolution mode and used it to replace the underlying convolution layer in Conformer network to obtain an improved Conformer network. Second, we adaptively fuse the deep features, which output through the improved Conformer network, by combining the Peak to Sidelobe Ratio (PSR), frame smoothness scores and adaptive adjustment weight. Extensive experiments on the OTB-2013, OTB-2015, UAV123, and VOT2019 benchmarks demonstrate that the proposed approach outperforms the state-of-the-art algorithms in tracking accuracy and robustness in complex scenes with occluded, blurred, and fast-moving targets.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call