Abstract

Infrared and visible image fusion intends to generate a synthetic image with superior scene representation and better visual perception. The existing deep learning-based fusion methods merely make use of the convolution operation to extract features with a local receptive field, without fully considering their multiscale and long-range dependency characteristics, which may fail to preserve some essential global context from source images. To this end, we develop a novel and efficient fusion network based on dense Res2net and double nonlocal attention models, termed Res2Fusion. We introduce Res2net and dense connections into the encoder network with multiple available receptive fields, which is used to extract the multiscale features, and can retain as much meaningful information as possible for fusion tasks. In addition, we develop double nonlocal attention models as a fusion layer to model long-range dependencies on the local features. Specifically, these attention models can refine feature maps obtained by the encoder network to more focus on prominent infrared targets and distinct visible details. Finally, the comprehensive attention maps are used to generate a fused result through the simple decoder network. Extensive experiments demonstrate that the proposed method can simultaneously retain highlighted infrared targets and rich visible details and transcends other state-of-the-art fusion methods in terms of subjective and objective evaluation. The corresponding code is publicly available at <uri xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">https://github.com/Zhishe-Wang/Res2Fusion</uri> .

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call