Although some deep learning-based image fusion approaches have realized promising results, how to extract information-rich features from different source images while preserving them in the fused image with less distortions remains challenging issue that needs to be addressed. Here, we propose a well worked-out GAN-based scheme with multi-scale feature extractor and global-local discriminator for infrared and visible image fusion. We use Y-Net as the backbone architecture to design the generator network, and introduce the residual dense block (RDblock) to yield more realistic fused images for infrared and visible images by learning discriminative multi-scale representations that are closer to the essence of different modal images. During feature reconstruction, the cross-modality shortcuts with contextual attention (CMSCA) are employed to selectively aggregate features at different scales and different levels to construct information-rich fused images with better visual effect. To ameliorate the information content of the fused image, we not only constrain the structure and contrast information using structural similarity index, but also evaluate the intensity and gradient similarities at both feature and image levels. Two global-local discriminators that combine global GAN with PatchGAN as a unified architecture help to dig for finer differences between the generated image and reference images, which force the generator to learn both the local radiation information and pervasive global details in two source images. It is worth mentioning that image fusion is achieved during confrontation without fusion rules. Lots of assessment tests demonstrate that the reported fusion scheme achieves superior performance against state-of-the-art works in meaningful information preservation.
Read full abstract