AITFuse: Infrared and visible image fusion via adaptive interactive transformer learning

Zhishe Wang,Fan Yang,Jing Sun,Jiawei Xu,Fengbao Yang,Xiaomei Yan

doi:10.1016/j.knosys.2024.111949

Abstract

Existing deep learning-based methods often follow either image-level or feature-level fusion frameworks to uniformly or separately extract features, ignoring the specialized interactive information learning, which may produce limited fusion performance. To tackle this challenge, we devise a powerful fusion baseline via adaptive interactive Transformer learning, namely AITFuse. Unlike previous methods, our network alternately incorporates local and global relationships through collaborative learning of both CNN and Transformer. In particular, we propose a cascaded token-wise and channel-wise Vision Transformer architecture with different attention mechanisms to model the long-range contexts, and allow feature communication across different tokens and independent channels in an interactive manner. On this basis, the modal-specific feature rectification module employs self-attention operation to revise distinctive features within the same domain for efficient encoding. Meanwhile, the cross-modal feature integration module constructs cross-attention mechanism to fuse complementary characteristics from different domains for multi-level decoding. In addition, we discard the learning position embedding to release our fusion model for the image of arbitrary sizes without splitting operations. Extensive experiments on mainstream datasets and downstream tasks demonstrate the rationality and superiority of our AITFuse. The codes will be available at https://github.com/Zhishe-Wang/AITFuse.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

AITFuse: Infrared and visible image fusion via adaptive interactive transformer learning

Abstract

Talk to us

Similar Papers

More From: Knowledge-Based Systems

Lead the way for us

Similar Papers

Accurate Identification of Spatial Domain by Incorporating Global Spatial Proximity and Local Expression Proximity.
Yuanyuan Yu ... Yao He
Biomolecules | VOL. 14
Yuanyuan Yu, et. al.Yuanyuan Yu ... Yao He
09 Jun 2024
Biomolecules | VOL. 14

Temporal Action Detection Methods Based on Deep Learning
Junyi Shen ... Jikai Zhang
International Journal of Pattern Recognition and Artificial Intelligence | VOL. 36
Junyi Shen, et. al.Junyi Shen ... Jikai Zhang
14 Mar 2022
International Journal of Pattern Recognition and Artificial Intelligence | VOL. 36

Spatiotemporal Inconsistency Learning and Interactive Fusion for Deepfake Video Detection
Dengyong Zhang ... Feifan Qi
ACM Transactions on Multimedia Computing, Communications, and Applications | VOL. -
Dengyong Zhang, et. al.Dengyong Zhang ... Feifan Qi
13 May 2024
ACM Transactions on Multimedia Computing, Communications, and Applications | VOL. -

Knowledge attention sandwich neural network for text classification
Zhiqiang Zhan ... Changjian Hu
Neurocomputing | VOL. 406
Zhiqiang Zhan, et. al.Zhiqiang Zhan ... Changjian Hu
12 Apr 2020
Neurocomputing | VOL. 406

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

AITFuse: Infrared and visible image fusion via adaptive interactive transformer learning

Abstract

Talk to us

Similar Papers

More From: Knowledge-Based Systems