Infrared and visible image fusion aim to leverage the respective advantages of both modalities to extract richer information from a scene. In environments with ample lighting, it is imperative to preserve the characteristics of visible images and salvage details lost due to overexposure. Conversely, in dark environments, a preference is typically inclined towards the utilization of infrared imagery. However, current methodologies are unable to adaptively modulate their tendency towards either modality, thereby resulting in suboptimal fusion quality. A gray histogram visually represents the dynamic range of image distribution, revealing the contrast and degree of brightness variations. Drawing from this insight, in this paper, we propose a novel image fusion method called Siam-AUnet for the first time. This methodology transforms the distribution of luminance histograms extracted from visible images into weights during the fusion of the two modalities. We adopt a Siamese network architecture and a multi-scale attention mechanism to focus more effectively on key features while reducing the number of parameters. Extensive experiments on different datasets and comparisons with other methods have demonstrated the superiority of this fusion approach. By utilizing the characteristics of the histogram, Siam-AUnet can adaptively tend towards the modality with clearer texture details. Qualitative and quantitative experimental results on the MSRS test dataset demonstrate that Siam-AUnet achieves optimal performance in terms of EN, SSIM, and MS_SSIM, while exhibiting suboptimal performance in metrics such as SCD, VIF, and Qqbf. In well-lit conditions, it can preserve the detailed features of visible images and the salient features of infrared images; while under adverse lighting conditions, it emphasizes the infrared modality features more strongly. Our code will be publicly available at https://github.com/xkangKK/Siam-AUnet.
Read full abstract