Infrared and visible image fusion (IVIF) aims to integrate the advantages of different modal images. Most existing deep learning-based methods often focus on a single IVIF task and ignore the effect of frequency information on the fusion results, which do not fully preserve saliency structures and important texture details. The core idea of this paper is based on the following observation: (1) image content can be characterized by different frequency domain components, low frequency represents base information, such as saliency structure, while high frequency contains texture detail information. (2) multi-tasks learning generally achieve better performance than single-task. Based on these observations, we propose a fusion model called Frequency-aware and Collaborative Learning (FCLFusion) for infrared and visible images. This model takes image fusion as the main task and introduces image reconstruction as an auxiliary task to collaboratively optimize the network, thereby improving the fusion quality. Specifically, we transform spatial domain features to the frequency domain and develop a frequency feature fusion module for guiding the primary network to generate the fused image. The sub-network generates the reconstructed images. Also, we preserve the saliency and detail features via frequency skip connections. Moreover, we propose a hybrid loss function that consists of two terms: frequency loss and self-supervised reconstruction loss. The former aims to prevent information loss in the frequency domain, while the latter improves the extraction of vital information. Extensive experiments verified on three public datasets demonstrate that our FCLFusion outperforms ten state-of-the-art fusion models.
Read full abstract