Infrared and visible image fusion focuses on the integration of complementary information, whose fused results with abundant texture details and salient targets can cater for human visual observation well. However, the loss of some important details is still a challenge in the image fusion process. In this paper, a novel method based on cross-modality reinforcement module and multi-attention fusion strategy is proposed to boost the end-to-end CNN backbone. Specifically, a cross-modality architecture is applied to compensate the spectral differences from heterogeneous images; and the multi-scale strip pooling is utilized as a further feature representation tool to model long-rang independencies precisely; then, a detail injection block is devised to fulfill the enhancement requirements of texture-contrast and target intensity; sequentially, a multi-attention fusion module is proposed to integrate features progressively. Extensive comparative experiments are conducted on several datasets to demonstrate the superiority of the proposed method in both quantitative metrics and visual perception, such as the average of visual information fidelity metric based on all the experiment samples reach to 0.964.
Read full abstract