Accelerate Literature Icon
Want to do a literature review? Try our new Literature Review workflow

A Review of Deep Learning-based Multi-modal Medical Image Fusion

  • Abstract
  • Literature Map
  • Similar Papers
Abstract
Translate article icon Translate Article Star icon

Introduction Medical image fusion combines the data obtained from different imaging modalities such as Computed Tomography (CT), Positron Emission Tomography (PET), and Magnetic Resonance Imaging (MRI) into a single, informative image that aids clinicians in diagnosis and treatment planning. No single imaging modality can provide complete information on its own. This has led to the emergence of a research field focused on integrating data from multiple modalities to maximize information in a single, unified representation. Methods CNN (Convolutional Neural Network) was applied to achieve robust and effective multi-modal image fusion. By delving into the principles and practical applications of this deep learning approach, the paper also provides a comparative analysis of CNN-based results with other conventional fusion techniques. Results CNN-based image fusion delivers far better results in terms of qualitative and quantitative analysis when compared with other conventional fusion methods. The paper also discusses future perspectives, emphasizing advancements in deep learning that could drive the evolution of CNN-based fusion and enhance its effectiveness in medical imaging. Discussion CNN-based multi-modal medical image fusion proves strong advantages over traditional methods in terms of feature preservation and adaptability. However, challenges such as data dependency, computational complexity, and generalization across modalities persist. Emerging trends like attention mechanisms and transformer models show promise in addressing these gaps. Future work should focus on improving interpretability and clinical applicability, ensuring that deep learning fusion methods can be reliably integrated into real-world diagnostic systems. Conclusion Ultimately, this work underscores the potential of CNN-based fusion to improve patient outcomes and shape the future of medical imaging by advancing the understanding of multi-modal fusion.

Similar Papers
  • Research Article
  • 10.61189/617079irudnn
A review of multimodal medical image fusion Developments in traditional, model-based and learning-based approaches
  • Dec 31, 2025
  • Perioperative Precision Medicine
  • Zhaopeng Zhou + 5 more

Multimodal medical image fusion technology optimizes image content by integrating images from diverse modalities, such as Computed Tomography (CT), Positron Emission Tomography (PET), Magnetic Resonance Imaging (MRI), and Single Photon Emission Computed Tomography (SPECT), while retaining critical information. With the rapid advancements in medical imaging technology, single-modal approaches have limitations in capturing comprehensive anatomical or functional characteristics. As a result, researchers are increasingly turning to multimodal fusion methods to enhance diagnostic accuracy and provide richer data for classification, detection, and segmentation tasks. In particular, during the perioperative period, multimodal image fusion plays a crucial role in surgical planning, intraoperative navigation, and postoperative evaluation, enabling precise localization of lesions and improving clinical decision-making. This paper presents a survey of the latest literature on medical image fusion, covering three major approaches: traditional methods, model-based methods, and learning-based methods. It discusses the advantages and limitations of each approach, with a particular emphasis on traditional image processing techniques, model-based fusion methods, and the integration of emerging deep learning (DL) technologies. Comparative experimental analysis highlights performance differences among these methods in terms of information retention, computational efficiency, and clinical applicability. Finally, the paper reviews performance evaluation metrics for multimodal fusion and provides recommendations for future research to further promote the widespread adoption of this technology in clinical diagnostics and intelligent healthcare.

  • Research Article
  • Cite Count Icon 380
  • 10.1016/j.compbiomed.2022.105253
A review on multimodal medical image fusion: Compendious analysis of medical modalities, multimodal databases, fusion techniques and quality metrics
  • Feb 3, 2022
  • Computers in Biology and Medicine
  • Muhammad Adeel Azam + 7 more

A review on multimodal medical image fusion: Compendious analysis of medical modalities, multimodal databases, fusion techniques and quality metrics

  • Research Article
  • Cite Count Icon 32
  • 10.1016/j.jksuci.2023.101733
Multimodal medical image fusion towards future research: A review
  • Aug 29, 2023
  • Journal of King Saud University - Computer and Information Sciences
  • Sajid Ullah Khan + 5 more

Multimodal medical image fusion towards future research: A review

  • Book Chapter
  • Cite Count Icon 4
  • 10.1016/b978-0-44-313233-9.00017-5
Chapter 11 - Deep learning-based multimodal medical image fusion
  • Jan 1, 2024
  • Data Fusion Techniques and Applications for Smart Healthcare
  • Aditya Kahol + 1 more

Chapter 11 - Deep learning-based multimodal medical image fusion

  • Research Article
  • Cite Count Icon 12
  • 10.3233/xst-210851
Deep learning supported disease detection with multi-modality image fusion.
  • Mar 29, 2021
  • Journal of X-Ray Science and Technology: Clinical Applications of Diagnosis and Therapeutics
  • F Sangeetha Francelin Vinnarasi + 3 more

Multi-modal image fusion techniques aid the medical experts in better disease diagnosis by providing adequate complementary information from multi-modal medical images. These techniques enhance the effectiveness of medical disorder analysis and classification of results. This study aims at proposing a novel technique using deep learning for the fusion of multi-modal medical images. The modified 2D Adaptive Bilateral Filters (M-2D-ABF) algorithm is used in the image pre-processing for filtering various types of noises. The contrast and brightness are improved by applying the proposed Energy-based CLAHE algorithm in order to preserve the high energy regions of the multimodal images. Images from two different modalities are first registered using mutual information and then registered images are fused to form a single image. In the proposed fusion scheme, images are fused using Siamese Neural Network and Entropy (SNNE)-based image fusion algorithm. Particularly, the medical images are fused by using Siamese convolutional neural network structure and the entropy of the images. Fusion is done on the basis of score of the SoftMax layer and the entropy of the image. The fused image is segmented using Fast Fuzzy C Means Clustering Algorithm (FFCMC) and Otsu Thresholding. Finally, various features are extracted from the segmented regions. Using the extracted features, classification is done using Logistic Regression classifier. Evaluation is performed using publicly available benchmark dataset. Experimental results using various pairs of multi-modal medical images reveal that the proposed multi-modal image fusion and classification techniques compete the existing state-of-the-art techniques reported in the literature.

  • Research Article
  • Cite Count Icon 6
  • 10.1142/s0219467823400053
Convolutional Neural Networks (CNN) with Quantum-Behaved Particle Swarm Optimization (QPSO)-Based Medical Image Fusion
  • Aug 22, 2022
  • International Journal of Image and Graphics
  • A Ancy Mergin + 1 more

Medical imaging fusion is the process of combining pictures from various imaging modalities to create a single image that may be used in clinical settings. Robust methods for merging image data from several modalities are being developed in the field of multimodal medical imaging. Deep learning (DL) has been widely researched in two areas: pattern recognition and image processing. We will demonstrate a multimodal image fusion with DL implementation that considers the characteristics of medical diagnostic imaging as well as the demands of clinical practice. For the past three years, pixel-level picture fusion has been a hot topic. This paper proposes a new multimodal medical picture fusion technique for a wide range of medical diagnostic challenges. Image fusion is crucial in biomedical research and clinical diagnostics for biomedical image processing and therapy planning. The most convincing argument for fusion is obtaining a significant amount of critical information from the input photographs. We show how a well-organized multimodal medical image fusion technique can be utilized to integrate computed tomography (CT) and magnetic resonance imaging (MRI) data in this study. Using convolutional neural networks (CNNs), the quantum-behaved particle swarm optimization (QPSO) algorithm was used to create a method for integrating multimodal medical pictures. In order to improve the overall quality and efficiency of QPSO, it was chosen to add the metrics of image entropy, standard deviation, average gradient (AG), spatial frequency (SF), and visual information fidelity (VIF). In experiments, multimodal medical images are utilized to evaluate a variety of parameters, including performance and algorithm stability. When compared to the other possibilities, the recommended technique outperformed them in the evaluations. On a range of quantitative metrics, this method outperforms the alternatives.

  • Research Article
  • Cite Count Icon 159
  • 10.1016/j.ins.2021.04.052
Multimodal medical image fusion based on joint bilateral filter and local gradient energy
  • Apr 20, 2021
  • Information Sciences
  • Xiaosong Li + 4 more

Multimodal medical image fusion based on joint bilateral filter and local gradient energy

  • Research Article
  • 10.3389/fimag.2026.1752625
Enhancement of multi-objective Darwinian particle swarm optimization for neural-network-based multimodal medical image fusion
  • Feb 26, 2026
  • Frontiers in Imaging
  • Chisom E Ogbuanya

The purpose of this research is to develop a multimodal medical image fusion method that will provide high-performance fusion images at a speed high enough for efficient real-time image-guided surgeries. This paper therefore proposes an improved multi-objective Darwinian particle swarm optimization method that incorporates a fractional calculus operator for effective multimodal medical image fusion. This is because multimodal medical image fusion is essential in many clinical diagnoses, and it represents a multi-objective problem due to the important objective indicators for measuring its efficiencies, such as the parameters of the neural network and the speed of the fusion process. The proposed method aims to optimize the Tsallis cross-entropy as a stimulating input to the pulse-coupled neural network (PCNN) for multimodal image fusion. In this work, multi-objective Darwinian particle swarm optimization (MODPSO) is utilized due to its ability to escape local optima more effectively than classical multi-objective particle swarm optimization (MOPSO). The approach uses the fact that the convergence rate of MODPSO is improved by introducing a fractional calculus operator, which is incorporated into the updating formulas for the velocity and position of the particles. The PCNN output serves as an optimal parameter for fusing the high-frequency coefficients of decomposed source images, which are initially decomposed into low- and high-frequency subbands. The low-frequency coefficients are fused using an averaging method. Results obtained in this paper show that the proposed method yields the highest average accuracy of 90.7% after a three-fold cross-validation was carried out with a small dataset extracted from a larger available dataset. In conclusion, the experimental results demonstrate the superiority of the proposed method over comparative methods in terms of both visual quality and quantitative evaluation.

  • Research Article
  • Cite Count Icon 159
  • 10.1016/j.bspc.2021.102480
An image quality enhancement scheme employing adolescent identity search algorithm in the NSST domain for multimodal medical image fusion
  • Feb 19, 2021
  • Biomedical Signal Processing and Control
  • Jais Jose + 6 more

An image quality enhancement scheme employing adolescent identity search algorithm in the NSST domain for multimodal medical image fusion

  • PDF Download Icon
  • Research Article
  • Cite Count Icon 5
  • 10.3390/s24113545
CIRF: Coupled Image Reconstruction and Fusion Strategy for Deep Learning Based Multi-Modal Image Fusion.
  • May 30, 2024
  • Sensors (Basel, Switzerland)
  • Junze Zheng + 3 more

Multi-modal medical image fusion (MMIF) is crucial for disease diagnosis and treatment because the images reconstructed from signals collected by different sensors can provide complementary information. In recent years, deep learning (DL) based methods have been widely used in MMIF. However, these methods often adopt a serial fusion strategy without feature decomposition, causing error accumulation and confusion of characteristics across different scales. To address these issues, we have proposed the Coupled Image Reconstruction and Fusion (CIRF) strategy. Our method parallels the image fusion and reconstruction branches which are linked by a common encoder. Firstly, CIRF uses the lightweight encoder to extract base and detail features, respectively, through the Vision Transformer (ViT) and the Convolutional Neural Network (CNN) branches, where the two branches interact to supplement information. Then, two types of features are fused separately via different blocks and finally decoded into fusion results. In the loss function, both the supervised loss from the reconstruction branch and the unsupervised loss from the fusion branch are included. As a whole, CIRF increases its expressivity by adding multi-task learning and feature decomposition. Additionally, we have also explored the impact of image masking on the network's feature extraction ability and validated the generalization capability of the model. Through experiments on three datasets, it has been demonstrated both subjectively and objectively, that the images fused by CIRF exhibit appropriate brightness and smooth edge transition with more competitive evaluation metrics than those fused by several other traditional and DL-based methods.

  • Book Chapter
  • Cite Count Icon 3
  • 10.1016/b978-0-44-313233-9.00010-2
Chapter 4 - Robust watermarking algorithm based on multimodal medical image fusion
  • Jan 1, 2024
  • Data Fusion Techniques and Applications for Smart Healthcare
  • Om Prakash Singh + 4 more

Chapter 4 - Robust watermarking algorithm based on multimodal medical image fusion

  • Research Article
  • Cite Count Icon 37
  • 10.1016/j.bspc.2021.103214
Multimodal image fusion and denoising in NSCT domain using CNN and FOTGV
  • Oct 13, 2021
  • Biomedical Signal Processing and Control
  • Sonal Goyal + 3 more

Multimodal image fusion and denoising in NSCT domain using CNN and FOTGV

  • Book Chapter
  • Cite Count Icon 2
  • 10.1007/978-3-030-64559-5_13
Multi-modal Image Fusion Based on Weight Local Features and Novel Sum-Modified-Laplacian in Non-subsampled Shearlet Transform Domain
  • Jan 1, 2020
  • Hajer Ouerghi + 2 more

Multi-modal medical image fusion plays a significant role in clinical applications like noninvasive diagnosis and image-guided surgery. However, designing an efficient image fusion technique is still a challenging task. In this paper, we propose an improved multi-modal medical image fusion method to enhance the visual quality and contrast of the fused image. To achieve this work, the registered source images are firstly decomposed into low-frequency (LF) and several high-frequency (HF) sub-images via non-subsampled shearlet transform (NSST). Afterward, LF sub-images are combined using the proposed weight local features fusion rule based on local energy and standard deviation, while HF sub-images are fused based on the novel sum-modified-laplacien (NSML) technique. Finally, inversed NSST is applied to reconstruct the fused image. Furthermore, the proposed method is extended to color multi-modal image fusion that effectively restrains color distortion and enhances spatial and spectral resolutions. To evaluate the performance, various experiments conducted on different datasets of gray-scale and color images. Experimental results show that the proposed scheme achieves better performance than other state-of-art proposed algorithms in both visual effects and objective criteria.

  • Research Article
  • 10.55463/issn.1674-2974.49.3.13
Spare Theory for the Detection of Brain Tumor using Multimodal Medical Image Fusion
  • Mar 28, 2022
  • Journal of Hunan University Natural Sciences
  • S L Jany Shabu + 5 more

The fusion of multimodal images is a trending research area, especially in the field of medical image processing. The purpose of image fusion is to classify medical images efficiently. The objective of the research work is to do the fusion of multimodal medical images for doing medical image classification. In this research, a new algorithm is proposed for the detection of brain tumors based on three main steps namely, fusion, segmentation, and classification. A sparse theory-based vector selection (STVS) algorithm is proposed for image fusion. In this algorithm, the multimodal images are first converted into patches. These patches are further vectorized. The vectorized patches are employed in the creation of dictionaries. The generated dictionaries along with the vectorized patches are used for the creation of sparse matrices. From the sparse matrices, a selection vector is formed using which the fused image is generated. The segmentation of the fused image is done using Intuitionistic fuzzy set-based k-means (IFSKM) clustering and the Otsu thresholding technique. The clusters of the IFSKM are generated based on the Intuitionistic fuzzy set (IFS) scheme. Finally, classification is performed based on a DCNN architecture. The proposed system is validated using the brain images from the Harvard Medical School. Quantitative analysis reveals that the proposed scheme achieves the best performance in terms of fusion, segmentation, and classification. The proposed STVS scheme attained high values of entropy, standard deviation, PSNR in dB, mean square error (MSE), structural similarity index (SSIM), and homogeneity with the values of 7.33, 55.25, 42.85, 0.098, 64.31, and 53.52 respectively.

  • Research Article
  • Cite Count Icon 1
  • 10.1166/jmihi.2021.3763
Adaptive Multimodal Image Fusion with a Deep Pyramidal Residual Learning Network
  • Aug 1, 2021
  • Journal of Medical Imaging and Health Informatics
  • Kiranmai Bellam + 4 more

Multimodal medical imaging is an indispensable requirement in the treatment of various pathologies to accelerate care. Rather than discrete images, a composite image combining complementary features from multimodal images is highly informative for clinical examinations, surgical planning, and progress monitoring. In this paper, a deep learning fusion model is proposed for the fusion of medical multimodal images. Based on pyramidal and residual learning units, the proposed model, strengthened with adaptive fusion rules, is tested on image pairs from a standard dataset. The potential of the proposed model for enhanced image exams is shown by fusion studies with deep network images and quantitative output metrics of magnetic resonance imaging and positron emission tomography (MRI/PET) and magnetic resonance imaging and single-photon emission computed tomography (MRI/SPECT). The proposed fusion model achieves the Structural Similarity Index Measure (SSIM) values of 0.9502 and 0.8103 for the MRI/SPECT and MRI/PET MRI/SPECT image sets, signifying the perceptual visual consistency of the fused images. Testing is performed on 20 pairs of MRI/SPECT and MRI/PET images. Similarly, the Mutual Information (MI) values of 2.7455 and 2.7776 obtained for the MRI/SPECT and MRI/PET image sets, indicating the model’s ability to capture the information content from the source images to the composite image. Further, the proposed model allows deploying its variants, introducing refinements on the basic model suitable for the fusion of low and high-resolution medical images of diverse modalities.

Save Icon
Up Arrow
Open/Close
Notes

Save Important notes in documents

Highlight text to save as a note, or write notes directly

You can also access these Documents in Paperpal, our AI writing tool

Powered by our AI Writing Assistant