Abstract

In infrared (IR) and visible image fusion, visual appearance of fused images produced by an end-to-end fusion network relies on a loss function that defines the desired properties of the fusion results. The previous approaches mainly focus on measuring the similarity of an output image with respective IR and visible images using pixel-wise loss functions such as mean square error or structural similarity. This can lead, however, to preserve only local information in the fused images, and it may lose semantic information in the scene. In this work, we propose a new joint loss function for the image fusion, which considers visual quality and subsequent applications simultaneously. Our joint loss function guides the fusion network by transferring the knowledge of a pre-trained network to leverage perceptual similarity. The contrast loss function serves to prevent the reduction of global and local image contrasts in the fused image. The conventional data fidelity is also considered in the proposed function. Then, the joint loss function is used to train our U-Net like fusion network, a hybrid of convolutional and transformer blocks. Experimental results show that the proposed method preserves detailed scene structure from the both of source images and demonstrates the better quantitative results. Furthermore, the proposed fusion schemes can be used for combining multi-exposure and multi-focus image pairs. To show the further effectiveness of IR and visible image fusion, we apply the results of fusion methods for downstream tasks so that the proposed method improves the cases of only using original modalities.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call