Diff-IF: Multi-modality image fusion via diffusion model with fusion knowledge prior

  • Abstract
  • Literature Map
  • Similar Papers
Abstract
Translate article icon Translate Article Star icon

Diff-IF: Multi-modality image fusion via diffusion model with fusion knowledge prior

Similar Papers
  • Research Article
  • Cite Count Icon 22
  • 10.1109/tcsvt.2022.3163649
Designing CNNs for Multimodal Image Restoration and Fusion via Unfolding the Method of Multipliers
  • Sep 1, 2022
  • IEEE Transactions on Circuits and Systems for Video Technology
  • Iman Marivani + 3 more

Multimodal, alias, guided, image restoration is the reconstruction of a degraded image from a target modality with the aid of a high quality image from another modality. A similar task is image fusion; it refers to merging images from different modalities into a composite image. Traditional approaches for multimodal image restoration and fusion include analytical methods that are computationally expensive at inference time. Recently developed deep learning methods have shown a great performance at a reduced computational cost; however, since these methods do not incorporate prior knowledge about the problem at hand, they result in a “black box” model, that is, one can hardly say what the model has learned. In this paper, we formulate multimodal image restoration and fusion as a coupled convolutional sparse coding problem, and adopt the Method of Multipliers (MM) for its solution. Then, we use the MM-based solution to design a convolutional neural network (CNN) encoder that follows the principle of deep unfolding. To address multimodal image restoration and fusion, we design two multimodal models which employ the proposed encoder followed by an appropriately designed decoder that maps the learned representations to the desired output. Unlike most existing deep learning designs comprising multiple encoding branches followed by a concatenation or a linear combination fusion block, the proposed design provides an efficient and structured way to fuse information at different stages of the network, providing representations that can lead to accurate image reconstruction. The proposed models are applied to three image restoration tasks, as well as two image fusion tasks. Quantitative and qualitative comparisons against various state-of-the-art analytical and deep learning methods corroborate the superior performance of the proposed framework.

  • Research Article
  • Cite Count Icon 11
  • 10.1016/j.jestch.2022.101245
Enhanced JAYA optimization based medical image fusion in adaptive non subsampled shearlet transform domain
  • Sep 13, 2022
  • Engineering Science and Technology, an International Journal
  • Suresh Shilpa + 3 more

Enhanced JAYA optimization based medical image fusion in adaptive non subsampled shearlet transform domain

  • Research Article
  • Cite Count Icon 31
  • 10.1016/j.jksuci.2023.101733
Multimodal medical image fusion towards future research: A review
  • Aug 29, 2023
  • Journal of King Saud University - Computer and Information Sciences
  • Sajid Ullah Khan + 5 more

Multimodal medical image fusion towards future research: A review

  • Book Chapter
  • Cite Count Icon 4
  • 10.1016/b978-0-44-313233-9.00017-5
Chapter 11 - Deep learning-based multimodal medical image fusion
  • Jan 1, 2024
  • Data Fusion Techniques and Applications for Smart Healthcare
  • Aditya Kahol + 1 more

Chapter 11 - Deep learning-based multimodal medical image fusion

  • Research Article
  • Cite Count Icon 1
  • 10.46532/978-81-950008-1-4_088
A Comprehensive Review of Multimodal Medical Image Fusion Techniques
  • Dec 30, 2020
  • Innovations in Information and Communication Technology Series
  • Jakir Hussain G K + 3 more

The multimodal image fusion is the process of combining relevant information from multiple imaging modalities. A fused image which contains recovering description than the one provided by any image fusion techniques are most widely used for real-world applications like agriculture, robotics and informatics, aeronautical, military, medical, pedestrian detection, etc. We try to give an outline of multimodal medical image fusion methods, developed during the period of time. The fusion of medical images in many combinations assists in utilizing it for medical diagnostics and examination. There is an incredible progress within the fields of deep learning, AI and bio-inspired optimization techniques. Effective utilization of these techniques is often used to further improve the effectiveness of image fusion algorithms.

  • PDF Download Icon
  • Research Article
  • Cite Count Icon 11
  • 10.3390/computers10100129
A Novel Multi-Modality Image Simultaneous Denoising and Fusion Method Based on Sparse Representation
  • Oct 13, 2021
  • Computers
  • Guanqiu Qi + 4 more

Multi-modality image fusion applied to improve image quality has drawn great attention from researchers in recent years. However, noise is actually generated in images captured by different types of imaging sensors, which can seriously affect the performance of multi-modality image fusion. As the fundamental method of noisy image fusion, source images are denoised first, and then the denoised images are fused. However, image denoising can decrease the sharpness of source images to affect the fusion performance. Additionally, denoising and fusion are processed in separate processing modes, which causes an increase in computation cost. To fuse noisy multi-modality image pairs accurately and efficiently, a multi-modality image simultaneous fusion and denoising method is proposed. In the proposed method, noisy source images are decomposed into cartoon and texture components. Cartoon-texture decomposition not only decomposes source images into detail and structure components for different image fusion schemes, but also isolates image noise from texture components. A Gaussian scale mixture (GSM) based sparse representation model is presented for the denoising and fusion of texture components. A spatial domain fusion rule is applied to cartoon components. The comparative experimental results confirm the proposed simultaneous image denoising and fusion method is superior to the state-of-the-art methods in terms of visual and quantitative evaluations.

  • Research Article
  • Cite Count Icon 159
  • 10.1016/j.bspc.2021.102480
An image quality enhancement scheme employing adolescent identity search algorithm in the NSST domain for multimodal medical image fusion
  • Feb 19, 2021
  • Biomedical Signal Processing and Control
  • Jais Jose + 6 more

An image quality enhancement scheme employing adolescent identity search algorithm in the NSST domain for multimodal medical image fusion

  • Book Chapter
  • Cite Count Icon 2
  • 10.1007/978-3-030-64559-5_13
Multi-modal Image Fusion Based on Weight Local Features and Novel Sum-Modified-Laplacian in Non-subsampled Shearlet Transform Domain
  • Jan 1, 2020
  • Hajer Ouerghi + 2 more

Multi-modal medical image fusion plays a significant role in clinical applications like noninvasive diagnosis and image-guided surgery. However, designing an efficient image fusion technique is still a challenging task. In this paper, we propose an improved multi-modal medical image fusion method to enhance the visual quality and contrast of the fused image. To achieve this work, the registered source images are firstly decomposed into low-frequency (LF) and several high-frequency (HF) sub-images via non-subsampled shearlet transform (NSST). Afterward, LF sub-images are combined using the proposed weight local features fusion rule based on local energy and standard deviation, while HF sub-images are fused based on the novel sum-modified-laplacien (NSML) technique. Finally, inversed NSST is applied to reconstruct the fused image. Furthermore, the proposed method is extended to color multi-modal image fusion that effectively restrains color distortion and enhances spatial and spectral resolutions. To evaluate the performance, various experiments conducted on different datasets of gray-scale and color images. Experimental results show that the proposed scheme achieves better performance than other state-of-art proposed algorithms in both visual effects and objective criteria.

  • Research Article
  • Cite Count Icon 15
  • 10.1109/tpami.2025.3591930
An Efficient Image Fusion Network Exploiting Unifying Language and Mask Guidance.
  • Nov 1, 2025
  • IEEE transactions on pattern analysis and machine intelligence
  • Zi-Han Cao + 3 more

Image fusion aims to merge image pairs collected by different sensors over the same scene, preserving their distinct features. Recent works have often focused on designing various image fusion losses, developing different network architectures, and leveraging downstream tasks (e.g., object detection) for image fusion. However, a few studies have explored how language and semantic masks can serve as guidance to aid image fusion. In this paper, we investigate how the combination of language and masks can guide image fusion tasks, discarding the previously complex frameworks, which rely on downstream tasks, GAN-based cycle training, diffusion models, or deep image priors. Additionally, we exploit a recurrent neural network-like architecture to build a lightweight network that avoids the quadratic-cost of traditional attention mechanisms. To adapt the receptance weighted key value (RWKV) model to an image modality, we modify it into a bidirectional version using an efficient scanning strategy (ESS). To guide image fusion by language and mask features, we introduce a multi-modal fusion module (MFM) to facilitate information exchange. Comprehensive experiments show that the proposed framework achieved state-of-the-art results in various image fusion tasks (i.e., visible-infrared image fusion, multi-focus image fusion, multi-exposure image fusion, medical image fusion, hyperspectral and multispectral image fusion, and pansharpening).

  • Research Article
  • 10.61189/617079irudnn
A review of multimodal medical image fusion Developments in traditional, model-based and learning-based approaches
  • Dec 31, 2025
  • Perioperative Precision Medicine
  • Zhaopeng Zhou + 5 more

Multimodal medical image fusion technology optimizes image content by integrating images from diverse modalities, such as Computed Tomography (CT), Positron Emission Tomography (PET), Magnetic Resonance Imaging (MRI), and Single Photon Emission Computed Tomography (SPECT), while retaining critical information. With the rapid advancements in medical imaging technology, single-modal approaches have limitations in capturing comprehensive anatomical or functional characteristics. As a result, researchers are increasingly turning to multimodal fusion methods to enhance diagnostic accuracy and provide richer data for classification, detection, and segmentation tasks. In particular, during the perioperative period, multimodal image fusion plays a crucial role in surgical planning, intraoperative navigation, and postoperative evaluation, enabling precise localization of lesions and improving clinical decision-making. This paper presents a survey of the latest literature on medical image fusion, covering three major approaches: traditional methods, model-based methods, and learning-based methods. It discusses the advantages and limitations of each approach, with a particular emphasis on traditional image processing techniques, model-based fusion methods, and the integration of emerging deep learning (DL) technologies. Comparative experimental analysis highlights performance differences among these methods in terms of information retention, computational efficiency, and clinical applicability. Finally, the paper reviews performance evaluation metrics for multimodal fusion and provides recommendations for future research to further promote the widespread adoption of this technology in clinical diagnostics and intelligent healthcare.

  • Research Article
  • Cite Count Icon 375
  • 10.1016/j.compbiomed.2022.105253
A review on multimodal medical image fusion: Compendious analysis of medical modalities, multimodal databases, fusion techniques and quality metrics
  • Feb 3, 2022
  • Computers in Biology and Medicine
  • Muhammad Adeel Azam + 7 more

A review on multimodal medical image fusion: Compendious analysis of medical modalities, multimodal databases, fusion techniques and quality metrics

  • Research Article
  • Cite Count Icon 9
  • 10.2174/0118750362370697250630063814
A Review of Deep Learning-based Multi-modal Medical Image Fusion
  • Jul 4, 2025
  • The Open Bioinformatics Journal
  • Shailesh Bhosekar + 4 more

Introduction Medical image fusion combines the data obtained from different imaging modalities such as Computed Tomography (CT), Positron Emission Tomography (PET), and Magnetic Resonance Imaging (MRI) into a single, informative image that aids clinicians in diagnosis and treatment planning. No single imaging modality can provide complete information on its own. This has led to the emergence of a research field focused on integrating data from multiple modalities to maximize information in a single, unified representation. Methods CNN (Convolutional Neural Network) was applied to achieve robust and effective multi-modal image fusion. By delving into the principles and practical applications of this deep learning approach, the paper also provides a comparative analysis of CNN-based results with other conventional fusion techniques. Results CNN-based image fusion delivers far better results in terms of qualitative and quantitative analysis when compared with other conventional fusion methods. The paper also discusses future perspectives, emphasizing advancements in deep learning that could drive the evolution of CNN-based fusion and enhance its effectiveness in medical imaging. Discussion CNN-based multi-modal medical image fusion proves strong advantages over traditional methods in terms of feature preservation and adaptability. However, challenges such as data dependency, computational complexity, and generalization across modalities persist. Emerging trends like attention mechanisms and transformer models show promise in addressing these gaps. Future work should focus on improving interpretability and clinical applicability, ensuring that deep learning fusion methods can be reliably integrated into real-world diagnostic systems. Conclusion Ultimately, this work underscores the potential of CNN-based fusion to improve patient outcomes and shape the future of medical imaging by advancing the understanding of multi-modal fusion.

  • Research Article
  • 10.3760/cma.j.issn.1001-2346.2019.12.007
Consistency of neurovascular relationship between multimodal image fusion 3D reconstruction and intraoperative findings of microvascular decompression for primary trigeminal neuralgia
  • Dec 28, 2019
  • Chinese Journal of Neurosurgery
  • Yingbin Jiao + 7 more

Objective To explore the consistency of neurovascular relationships between multimodal image fusion 3D reconstruction and intraoperative findings in microvascular decompression (MVD) for primary trigeminal neuralgia (PTN). Methods A retrospective analysis was conducted on the clinical data of 50 PTN patients treated with MVD at Department of Neurosurgery, Qingdao University Hospital from January to November 2018. All subjects underwent three-dimensional time-flying magnetic resonance angiography (3D-TOF-MRA) and three-dimensional cyclic phase steady-state acquisition rapid imaging (3D-FIESTA) sequences. Then, the 3D-slicer software was used to reconstruct the multimodal fusion 3D image. Multimodal image fusion 3D reconstruction images and surgical video were analyzed to determine the offending vessels responsible for trigeminal neuralgia. At the same time, the direction of compression, compression site and compression degree of the trigeminal nerve were analyzed. Kappa consistency test method was used to judge the consistency of the two approaches above. Results With MVD set as the standard, the accuracies of multimodal image fusion 3D reconstruction images in determining the offending vessels, direction of compression, compression site and the degree of compression were 92.0% (46/50), 92.0% (46/50), 96.0% (48/50) and 58.0% (29/50), respectively. Multimodal image fusion 3D reconstruction images and MVD showed high consistency in judging offending vessels, compression direction and compression position (Kappa values: 0.729, 0.903 and 0.955 respectively, all P<0.001). However, the consistency was poor in judging the degree of compression of offending vessels to the trigeminal nerve (Kappa value=0.227, P=0.002). The degree of compression was higher in intraoperative findings of MVD than that revealed by multimodal image fusion three-dimensional reconstruction (mean values: 2.57 and 1.58 respectively, Z=-4.499, P<0.001). Conclusions Preoperative multi-modal image fusion 3D reconstruction could help accurately determine the offending vessel, compression direction and compression position of PTN, which seems highly consistent with intraoperative findings of MVD. Preliminary speculation could be used as one of the methods facilitating preoperative diagnosis. Key words: Trigeminal neuralgia; Multimodal image fusion; Microvascular decompression; Neurovascular relationship; Computer-aided diagnosis

  • Research Article
  • 10.1109/tpami.2026.3655694
A General Image Fusion Approach Exploiting Gradient Transfer Learning and Fusion Rule Unfolding.
  • Jan 1, 2026
  • IEEE transactions on pattern analysis and machine intelligence
  • Wu Wang + 3 more

The goal of a deep learning-based general image fusion method is to solve multiple image fusion tasks with a single model, thereby facilitating the deployment of models in practical applications. However, existing methods fail to provide an efficient and comprehensive solution from both model training and network design perspectives. Regarding model training, current approaches cannot effectively leverage complementary information across different tasks. In terms of network design, they rely on experience-based network designs. To address these issues, we propose a comprehensive framework for general image fusion using the newly proposed gradient transfer learning and fusion rule unfolding. To leverage complementary information across different tasks during training, we propose a sequential gradient-transfer framework based on the idea that different image fusion tasks often exhibit complementary structural details and that image gradients effectively capture these details. To move beyond heuristic-based network design, we evolved a fundamental image fusion rule and integrated it into a deep equilibrium model, resulting in a more efficient and versatile image fusion network capable of uniformly handling various fusion tasks. Considering three different image fusion tasks, i.e., multi-focus image fusion, multi-exposure image fusion, and infrared and visible image fusion, our method not only produces images with richer structural information but also achieves highly competitive objective metrics. Furthermore, the results of generalization experiments on previously unseen image fusion tasks, i.e., medical image fusion, demonstrate that our method significantly outperforms competing approaches. The code will be available upon possible acceptance.

  • Research Article
  • Cite Count Icon 111
  • 10.1016/j.bspc.2017.01.003
Optimum spectrum mask based medical image fusion using Gray Wolf Optimization
  • Jan 14, 2017
  • Biomedical Signal Processing and Control
  • Ebenezer Daniel + 3 more

Optimum spectrum mask based medical image fusion using Gray Wolf Optimization

Save Icon
Up Arrow
Open/Close
Notes

Save Important notes in documents

Highlight text to save as a note, or write notes directly

You can also access these Documents in Paperpal, our AI writing tool

Powered by our AI Writing Assistant