A General Image Fusion Approach Exploiting Gradient Transfer Learning and Fusion Rule Unfolding.
The goal of a deep learning-based general image fusion method is to solve multiple image fusion tasks with a single model, thereby facilitating the deployment of models in practical applications. However, existing methods fail to provide an efficient and comprehensive solution from both model training and network design perspectives. Regarding model training, current approaches cannot effectively leverage complementary information across different tasks. In terms of network design, they rely on experience-based network designs. To address these issues, we propose a comprehensive framework for general image fusion using the newly proposed gradient transfer learning and fusion rule unfolding. To leverage complementary information across different tasks during training, we propose a sequential gradient-transfer framework based on the idea that different image fusion tasks often exhibit complementary structural details and that image gradients effectively capture these details. To move beyond heuristic-based network design, we evolved a fundamental image fusion rule and integrated it into a deep equilibrium model, resulting in a more efficient and versatile image fusion network capable of uniformly handling various fusion tasks. Considering three different image fusion tasks, i.e., multi-focus image fusion, multi-exposure image fusion, and infrared and visible image fusion, our method not only produces images with richer structural information but also achieves highly competitive objective metrics. Furthermore, the results of generalization experiments on previously unseen image fusion tasks, i.e., medical image fusion, demonstrate that our method significantly outperforms competing approaches.
- Research Article
66
- 10.1016/j.inffus.2022.07.016
- Jul 27, 2022
- Information Fusion
Unified gradient- and intensity-discriminator generative adversarial network for image fusion
- Research Article
198
- 10.1016/j.inffus.2022.11.010
- Nov 14, 2022
- Information Fusion
MUFusion: A general unsupervised image fusion network based on memory unit
- Research Article
32
- 10.2139/ssrn.4130858
- Jan 1, 2022
- SSRN Electronic Journal
Transfuse: A Unified Transformer-Based Image Fusion Framework Using Self-Supervised Learning
- Research Article
569
- 10.1609/aaai.v34i07.6975
- Apr 3, 2020
- Proceedings of the AAAI Conference on Artificial Intelligence
In this paper, we propose a fast unified image fusion network based on proportional maintenance of gradient and intensity (PMGI), which can end-to-end realize a variety of image fusion tasks, including infrared and visible image fusion, multi-exposure image fusion, medical image fusion, multi-focus image fusion and pan-sharpening. We unify the image fusion problem into the texture and intensity proportional maintenance problem of the source images. On the one hand, the network is divided into gradient path and intensity path for information extraction. We perform feature reuse in the same path to avoid loss of information due to convolution. At the same time, we introduce the pathwise transfer block to exchange information between different paths, which can not only pre-fuse the gradient information and intensity information, but also enhance the information to be processed later. On the other hand, we define a uniform form of loss function based on these two kinds of information, which can adapt to different fusion tasks. Experiments on publicly available datasets demonstrate the superiority of our PMGI over the state-of-the-art in terms of both visual effect and quantitative metric in a variety of fusion tasks. In addition, our method is faster compared with the state-of-the-art.
- Research Article
7
- 10.3390/s22072457
- Mar 23, 2022
- Sensors
In this paper, we propose a unified and flexible framework for general image fusion tasks, including multi-exposure image fusion, multi-focus image fusion, infrared/visible image fusion, and multi-modality medical image fusion. Unlike other deep learning-based image fusion methods applied to a fixed number of input sources (normally two inputs), the proposed framework can simultaneously handle an arbitrary number of inputs. Specifically, we use the symmetrical function (e.g., Max-pooling) to extract the most significant features from all the input images, which are then fused with the respective features from each input source. This symmetry function enables permutation-invariance of the network, which means the network can successfully extract and fuse the saliency features of each image without needing to remember the input order of the inputs. The property of permutation-invariance also brings convenience for the network during inference with unfixed inputs. To handle multiple image fusion tasks with one unified framework, we adopt continual learning based on Elastic Weight Consolidation (EWC) for different fusion tasks. Subjective and objective experiments on several public datasets demonstrate that the proposed method outperforms state-of-the-art methods on multiple image fusion tasks.
- Research Article
16
- 10.1109/tpami.2025.3591930
- Nov 1, 2025
- IEEE transactions on pattern analysis and machine intelligence
Image fusion aims to merge image pairs collected by different sensors over the same scene, preserving their distinct features. Recent works have often focused on designing various image fusion losses, developing different network architectures, and leveraging downstream tasks (e.g., object detection) for image fusion. However, a few studies have explored how language and semantic masks can serve as guidance to aid image fusion. In this paper, we investigate how the combination of language and masks can guide image fusion tasks, discarding the previously complex frameworks, which rely on downstream tasks, GAN-based cycle training, diffusion models, or deep image priors. Additionally, we exploit a recurrent neural network-like architecture to build a lightweight network that avoids the quadratic-cost of traditional attention mechanisms. To adapt the receptance weighted key value (RWKV) model to an image modality, we modify it into a bidirectional version using an efficient scanning strategy (ESS). To guide image fusion by language and mask features, we introduce a multi-modal fusion module (MFM) to facilitate information exchange. Comprehensive experiments show that the proposed framework achieved state-of-the-art results in various image fusion tasks (i.e., visible-infrared image fusion, multi-focus image fusion, multi-exposure image fusion, medical image fusion, hyperspectral and multispectral image fusion, and pansharpening).
- Research Article
39
- 10.1109/tim.2021.3109379
- Jan 1, 2021
- IEEE Transactions on Instrumentation and Measurement
Advanced image fusion frameworks have focused on constructing a unified system to deliver general fusion solutions for all the potential subtasks, including visible and infrared image fusion, multi-focus image fusion, and multi-exposure image fusion. However, the fusion performance of these unified solutions cannot be guaranteed as the involved subtasks exhibit essential diversity. Besides, the huge volume of these models also sacrifices their flexibility in practical applications. To this end, we propose a lightweight unified fusion network to balance the multi-level information featured across different channels and different layers. In particular, We construct a novel network architecture based on the Ghost module endowed with a new loss function. The depth of the designed network is increased, while the number of parameters is reduced by an order of magnitude compared to existing learning-based fusion approaches. In principle, our network belongs to the autoencoder paradigm, comprising three parts: encoder, fusion layer, and decoder, respectively. Considering that the efficacy of existing autoencoder-based approaches is impeded by the single fusion strategy, we use the guided filtering approach to decompose the source image into a base layer and a detail layer to extend the diversity of the inputs. Drawing on this, we can design different fusion strategies in these two layers for adapting different image fusion tasks. In the fusion layer, besides the three basic fusion strategies, i.e., average, max, and spatial attention, an additional gradient perception strategy is proposed to handle the multi-focus image fusion problem, improving the corresponding fusion performance. Qualitative and quantitative experiments show that our UNIFusion achieves promising results in these image fusion tasks, compared to the state of the art.
- Research Article
10
- 10.1016/j.infrared.2022.104417
- Oct 29, 2022
- Infrared Physics & Technology
GeFuNet: A knowledge-guided deep network for the infrared and visible image fusion
- Research Article
103
- 10.1016/j.inffus.2022.09.030
- Oct 7, 2022
- Information Fusion
SGFusion: A saliency guided deep-learning framework for pixel-level image fusion
- Research Article
25
- 10.1016/j.neucom.2021.08.044
- Aug 14, 2021
- Neurocomputing
A light-weight, efficient, and general cross-modal image fusion network
- Research Article
159
- 10.1016/j.ins.2021.04.052
- Apr 20, 2021
- Information Sciences
Multimodal medical image fusion based on joint bilateral filter and local gradient energy
- Research Article
50
- 10.1109/tip.2024.3512365
- Jan 1, 2025
- IEEE transactions on image processing : a publication of the IEEE Signal Processing Society
Image fusion facilitates the integration of information from various source images of the same scene into a composite image, thereby benefiting perception, analysis, and understanding. Recently, diffusion models have demonstrated impressive generative capabilities in the field of computer vision, suggesting significant potential for application in image fusion. The forward process in the diffusion models requires the gradual addition of noise to the original data. However, typical unsupervised image fusion tasks (e.g., infrared-visible, medical, and multi-exposure image fusion) lack ground truth images (corresponding to the original data in diffusion models), thereby preventing the direct application of the diffusion models. To address this problem, we propose a versatile diffusion model-based unsupervised framework for image fusion, termed as VDMUFusion. In the proposed method, we integrate the fusion problem into the diffusion sampling process by formulating image fusion as a weighted average process and establishing appropriate assumptions about the noise in the diffusion model. To simplify the training process, we propose a multi-task learning framework that replaces the original noise prediction network, allowing for simultaneous prediction of noise and fusion weights. Meanwhile, our method employs joint training across various fusion tasks, which significantly improves noise prediction accuracy and yields higher quality fused images compared to training on a single task. Extensive experimental results demonstrate that the proposed method delivers very competitive performance across various image fusion tasks. The code is available at https://github.com/yuliu316316/VDMUFusion.
- Research Article
57
- 10.1109/tmm.2021.3129354
- Jan 1, 2023
- IEEE Transactions on Multimedia
This paper proposes an image fusion framework based on separate representation learning, called IFSepR. We believe that both the co-modal image and the multi-modal image have common and private features based on prior knowledge, exploiting this disentangled representation can help to image fusion, especially to fusion rule design. Inspired by the autoencoder network and contrastive learning, a multi-branch encoder with contrastive constraints is built to learn the common and private features of paired images. In the fusion stage, based on the disentangled features, a general fusion rule is designed to integrate the private features, then combining the fused private features and the common feature are fed into the decoder, reconstructing the fused image. We perform a series of evaluations on three typical image fusion tasks, including multi-focus image fusion, infrared and visible image fusion, medical image fusion. Quantitative and qualitative comparison with five state-of-art image fusion methods demonstrates the advantages of our proposed model.
- Research Article
11
- 10.1016/j.cviu.2024.104218
- Nov 6, 2024
- Computer Vision and Image Understanding
UUD-Fusion: An unsupervised universal image fusion approach via generative diffusion model
- Research Article
53
- 10.1016/j.inffus.2024.102414
- Apr 8, 2024
- Information Fusion
A general image fusion framework using multi-task semi-supervised learning