Infrared and visible image fusion via mixed-frequency hierarchical guided learning

Pengjun Zhang,Wei Jin,Zhaohui Gong,Zejian Zhang,Zhiwei Wu

doi:10.1016/j.infrared.2023.104926

Abstract

Existing deep learning-based fusion methods usually model local information through convolution operation or global contexts via self-attention mechanism. This serial scheme discards the local features during globality modeling, or vice versa, which may generate limited fusion performance. To tackle this issue, we introduce a mixed-frequency hierarchical guided learning network, or FreqFuse for short. More specifically, we first design a parallel frequency mixer through a channel splitting mechanism, including max-pooling and self-attention paths, to learn both high and low-frequency information. The mixer can provide more comprehensive features within a wide frequency range compared with a single dependency. Second, we develop a dual-Transformer integration module to guide fusion progress. The assigned weights are calculated by cross-token and cross-channel Transformer, which are used to measure the activity levels of source images and preserve their modality characteristics in the intermediate fused features. On this basis, we build a hierarchical guidance decoder to reconstruct a final fusion image. The cross-scale mixed-frequency features are reused to gradually optimize the activity levels of different modality images, and promote the fused result to be highly informative and strongly characterized. We benchmark the proposed FreqFuse on different datasets, and experimental results demonstrate that it achieves impressive performance compared with other methods.

Full Text