Abstract

Infrared image contains typical targets characterized by pixel intensity distribution, while visible image preserves structural details represented by edges and gradient. Infrared and visible image fusion aims to merge their respective key information, and produce a more comprehensive image with superior target perception and better detail representation. Most existing methods cannot reach this goal well, and are easy to generate a limited fused result. Towards this end, a dual-path residual attention fusion network, namely DRAFusion, is proposed in this paper. We construct a multi-scale dense network architecture, which adequately aggregates horizontal and vertical intermediate features at all scales and layers. In the fusion layer, we develop a residual attention fusion module, cascading channel-wise and spatial-wise attention models, which is used to model the global contextual dependencies, and interact with feature maps to more focus on their respective key information. In addition, we design a novel feature adaptation loss function to control the proportion of their key information in the training phase, which can keep a better balance between fused results and source images. Extensive experiments demonstrate that our DRAFusion produces a remarkable performance on the three testing datasets, and is superior to other methods in terms of qualitative visual description and quantitative evaluation validation.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call