Abstract

Recent existing methods generally adopt a simple concatenation or addition strategy to integrate features at the fusion layer, failing to adequately consider the intrinsic characteristics of different modal images and feature interaction of different scales, which may produce a limited fusion performance. Toward this end, we introduce a cross-scale iterative attentional adversarial fusion network, namely CrossFuse. More specifically, in the generator, we design a cross-modal attention integrated module to merge the intrinsic content of different modal images. The parallel spatial-independent and channel-independent pathways are proposed to calculate the attentional weights, which are assigned to measure the activity levels of source images at the same scale. Moreover, we construct a cross-scale iterative decoder framework to interact with different modality features at different scales, which can constantly optimize their activity levels. By this means, the generator learns to integrate their modality characteristics via attentional weights in an iterative manner, and the generated result characterizes competitive infrared radiant intensity and distinct visible detail description. Extensive experiments on three different benchmarks demonstrate that our CrossFuse outperforms other nine state-of-the-art methods in terms of fusion performance, generalization ability and computational efficiency. Our codes will be released at https://github.com/Zhishe-Wang/CrossFuse.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call