MixFormer: End-to-End Tracking With Iterative Mixed Attention.

Yutao Cui,Limin Wang,Gangshan Wu,Cheng Jiang

doi:10.1109/tpami.2024.3349519

Abstract

Visual object tracking often employs a multi-stage pipeline of feature extraction, target information integration, and bounding box estimation. To simplify this pipeline and unify the process of feature extraction and target information integration, in this paper, we present a compact tracking framework, termed as MixFormer, built upon transformers. Our core design is to utilize the flexibility of attention operations, and we propose a Mixed Attention Module (MAM) for simultaneous feature extraction and target information integration. This synchronous modeling scheme allows us to extract target-specific discriminative features and perform extensive communication between target and search area. Based on MAM, we build our MixFormer trackers simply by stacking multiple MAMs and placing a localization head on top. Specifically, we instantiate two types of MixFormer trackers, a hierarchical tracker MixCvT, and a non-hierarchical simple tracker MixViT. For these two trackers, we investigate a series of pre-training methods and uncover the different behaviors between supervised pre-training and self-supervised pre-training in our MixFormer trackers. We also extend the masked autoencoder pre-training to our MixFormer trackers and design the new competitive TrackMAE pre-training technique. Finally, to handle multiple target templates during online tracking, we devise an asymmetric attention scheme in MAM to reduce computational cost, and propose an effective score prediction module to select high-quality templates. Our MixFormer trackers set a new state-of-the-art performance on seven tracking benchmarks, including LaSOT, TrackingNet, VOT2020, GOT-10 k, OTB100, TOTB and UAV123. In particular, our MixViT-L achieves AUC scores of 73.3% on LaSOT, 86.1% on TrackingNet and 82.8% on TOTB.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

MixFormer: End-to-End Tracking With Iterative Mixed Attention.

Abstract

Talk to us

Similar Papers

More From: IEEE Transactions on Pattern Analysis and Machine Intelligence

Lead the way for us

Journal: IEEE Transactions on Pattern Analysis and Machine Intelligence	Publication Date: Jun 1, 2024
Citations: 10

Similar Papers

MixFormer: End-to-End Tracking with Iterative Mixed Attention
Yutao Cui ... Cheng Jiang
-
Yutao Cui, et. al.Yutao Cui ... Cheng Jiang
01 Jun 2022
01 Jun 2022

Bidirectional Tracking Scheme for Visual Object Tracking Based on Recursive Orthogonal Least Squares
Zhiyong Huang ... Yuanlong Yu
IEEE Access | VOL. 7
Zhiyong Huang, et. al.Zhiyong Huang ... Yuanlong Yu
01 Jan 2019
IEEE Access | VOL. 7

Robust visual object tracking with interleaved segmentation
Stefan Becker ... Michael Arens
-
Stefan Becker, et. al.Stefan Becker ... Michael Arens
05 Oct 2017
05 Oct 2017

Learning Enhanced Feature Responses for Visual Object Tracking.
Runqing Zhang ... Yue Ming
Computational Intelligence and Neuroscience | VOL. 2022
Runqing Zhang, et. al.Runqing Zhang ... Yue Ming
08 Feb 2022
Computational Intelligence and Neuroscience | VOL. 2022

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

MixFormer: End-to-End Tracking With Iterative Mixed Attention.

Abstract

Talk to us

Similar Papers

More From: IEEE Transactions on Pattern Analysis and Machine Intelligence