Active Token Mixer

Guoqiang Wei,Zhizheng Zhang,Cuiling Lan,Yan Lu,Zhibo Chen

doi:10.1609/aaai.v37i3.25376

Abstract

The three existing dominant network families, i.e., CNNs, Transformers and MLPs, differ from each other mainly in the ways of fusing spatial contextual information, leaving designing more effective token-mixing mechanisms at the core of backbone architecture development. In this work, we propose an innovative token-mixer, dubbed Active Token Mixer (ATM), to actively incorporate contextual information from other tokens in the global scope into the given query token. This fundamental operator actively predicts where to capture useful contexts and learns how to fuse the captured contexts with the query token at channel level. In this way, the spatial range of token-mixing can be expanded to a global scope with limited computational complexity, where the way of token-mixing is reformed. We take ATMs as the primary operators and assemble them into a cascade architecture, dubbed ATMNet. Extensive experiments demonstrate that ATMNet is generally applicable and comprehensively surpasses different families of SOTA vision backbones by a clear margin on a broad range of vision tasks, including visual recognition and dense prediction tasks. Code is available at https://github.com/microsoft/ActiveMLP.

Full Text

Published version (

Free)

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Active Token Mixer

Abstract

Talk to us

Similar Papers

More From: Proceedings of the ... AAAI Conference on Artificial Intelligence. AAAI Conference on Artificial Intelligence

Lead the way for us

Journal: Proceedings of the ... AAAI Conference on Artificial Intelligence. AAAI Conference on Artificial Intelligence	Publication Date: Jun 26, 2023
Citations: 5

Similar Papers

NomMer: Nominate Synergistic Context in Vision Transformer for Visual Recognition
Hao Liu ... Xin Li
-
Hao Liu, et. al.Hao Liu ... Xin Li
01 Jun 2022
01 Jun 2022

Expediting Large-Scale Vision Transformer for Dense Prediction Without Fine-Tuning.
Yuhui Yuan ... Chao Zhang
IEEE Transactions on Pattern Analysis and Machine Intelligence | VOL. 46
Yuhui Yuan, et. al.Yuhui Yuan ... Chao Zhang
01 Jan 2024
IEEE Transactions on Pattern Analysis and Machine Intelligence | VOL. 46

Task-Specific Magnetic Fields from the Left Human Prefrontal Cortex
L. F. H. Basile ... I. M. Tarkka
-
L. F. H. Basile, et. al.L. F. H. Basile ... I. M. Tarkka
01 Jan 1999
01 Jan 1999

Vision transformers for dense prediction: A survey
Shuangquan Zuo ... Xuanhong Wang
Knowledge Based Systems | VOL. 253
Shuangquan Zuo, et. al.Shuangquan Zuo ... Xuanhong Wang
28 Jul 2022
Knowledge Based Systems | VOL. 253

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Active Token Mixer

Abstract

Talk to us

Similar Papers

More From: Proceedings of the ... AAAI Conference on Artificial Intelligence. AAAI Conference on Artificial Intelligence