Abstract

Multisensor image fusion is a challenging task that aims to produce a composite image by fusing visible (VI) and infrared (IR) images. Deep neural networks have shown impressive performance for VI and IR image fusion however, majority of them overlook the internal patch-recurrence property of source images, which limits their ability to learn diverse features. To address this issue, we propose a novel fusion framework based on vision transformer and graph attention, utilizing local patch repetition to enhance the feature representation and texture recovery. In particular, the proposed transformer blocks can learn high-frequency domain-specific information of the source images. The graph attention mechanism provides additional guidance for the features utilizing the similarity and symmetry information across patches. Furthermore, the proposed graph attention fusion block (GAFB) improves the selectivity and effectiveness of feature learning. The GAFB can identify significant corresponding local and global details of source images. The complementary information containing long-range and local symmetric details across domains is combined while preserving the appropriate apparent intensity to generate the fused image. Through extensive evaluations on benchmark datasets, our proposed technique demonstrates superior performance. Significantly, our approach achieves SSIM scores of 0.7552 on the TNO dataset and 0.7673 on the roadscene dataset, surpassing the state-of-the-art techniques used for evaluation.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call