Abstract

Infrared and visible image fusion aims to integrate complementary information from different types of images into one image. The existing image fusion methods are primarily based on convolutional neural network (CNN), which ignores long-range dependencies of images, resulting in the fusion network unable to generate images with good complementarity. Inspired by the importance of global information, we introduced the transformer technique into the CNN-based fusion network as a way to improve the entire image-level perception in complex fusion scenarios. In this paper, we propose an end-to-end image fusion framework based on transformer and hybrid feature extractor, which enables the network to focus on both global and local information, using the characteristics of transformer to compensate for the shortcomings of CNN itself. In our network, the dual-branch CNN module is used to extract the shallow features of images, and then the vision transformer module is used to obtain the global channel and spatial relationship in the features. Finally, the fusion results are obtained through the image reconstruction module. We calculate the loss in the features of different depths according to the different kinds of original images by using the pre-trained VGG19 network. The experimental results show the effectiveness of adding the vision transformer module. Compared with other traditional and deep learning methods, our method achieves state-of-the-art qualitative and quantitative experiments performance.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call