Infrared and visible image fusion (IVIF) aims to obtain an image that contains complementary information about the source images. However, it is challenging to define complementary information between source images in the lack of ground truth and without borrowing prior knowledge. Therefore, we propose a semisupervised transfer learning-based method for IVIF, termed STFuse, which aims to transfer knowledge from an informative source domain to a target domain, thus breaking the above limitations. The critical aspect of our method is to borrow supervised knowledge from the multifocus image fusion (MFIF) task and to filter out task-specific attribute knowledge by using a guidance loss Lg , which motivates its cross-task use in IVIF tasks. Using this cross-task knowledge effectively alleviates the limitation of the lack of ground truth on fusion performance, and the complementary expression ability under the constraint of supervised knowledge is more instructive than prior knowledge. Moreover, we designed a cross-feature enhancement module (CEM) that utilizes self-attention and mutual-attention features to guide each branch to refine features and then facilitate the integration of cross-modal complementary features. Extensive experiments demonstrate that our method has good advantages in terms of visual quality and statistical metrics, as well as the docking of high-level vision tasks, compared with other state-of-the-art methods.
Read full abstract