A Review of Deep Learning-based Multi-modal Medical Image Fusion
Introduction Medical image fusion combines the data obtained from different imaging modalities such as Computed Tomography (CT), Positron Emission Tomography (PET), and Magnetic Resonance Imaging (MRI) into a single, informative image that aids clinicians in diagnosis and treatment planning. No single imaging modality can provide complete information on its own. This has led to the emergence of a research field focused on integrating data from multiple modalities to maximize information in a single, unified representation. Methods CNN (Convolutional Neural Network) was applied to achieve robust and effective multi-modal image fusion. By delving into the principles and practical applications of this deep learning approach, the paper also provides a comparative analysis of CNN-based results with other conventional fusion techniques. Results CNN-based image fusion delivers far better results in terms of qualitative and quantitative analysis when compared with other conventional fusion methods. The paper also discusses future perspectives, emphasizing advancements in deep learning that could drive the evolution of CNN-based fusion and enhance its effectiveness in medical imaging. Discussion CNN-based multi-modal medical image fusion proves strong advantages over traditional methods in terms of feature preservation and adaptability. However, challenges such as data dependency, computational complexity, and generalization across modalities persist. Emerging trends like attention mechanisms and transformer models show promise in addressing these gaps. Future work should focus on improving interpretability and clinical applicability, ensuring that deep learning fusion methods can be reliably integrated into real-world diagnostic systems. Conclusion Ultimately, this work underscores the potential of CNN-based fusion to improve patient outcomes and shape the future of medical imaging by advancing the understanding of multi-modal fusion.
- Research Article
- 10.61189/617079irudnn
- Dec 31, 2025
- Perioperative Precision Medicine
Multimodal medical image fusion technology optimizes image content by integrating images from diverse modalities, such as Computed Tomography (CT), Positron Emission Tomography (PET), Magnetic Resonance Imaging (MRI), and Single Photon Emission Computed Tomography (SPECT), while retaining critical information. With the rapid advancements in medical imaging technology, single-modal approaches have limitations in capturing comprehensive anatomical or functional characteristics. As a result, researchers are increasingly turning to multimodal fusion methods to enhance diagnostic accuracy and provide richer data for classification, detection, and segmentation tasks. In particular, during the perioperative period, multimodal image fusion plays a crucial role in surgical planning, intraoperative navigation, and postoperative evaluation, enabling precise localization of lesions and improving clinical decision-making. This paper presents a survey of the latest literature on medical image fusion, covering three major approaches: traditional methods, model-based methods, and learning-based methods. It discusses the advantages and limitations of each approach, with a particular emphasis on traditional image processing techniques, model-based fusion methods, and the integration of emerging deep learning (DL) technologies. Comparative experimental analysis highlights performance differences among these methods in terms of information retention, computational efficiency, and clinical applicability. Finally, the paper reviews performance evaluation metrics for multimodal fusion and provides recommendations for future research to further promote the widespread adoption of this technology in clinical diagnostics and intelligent healthcare.
- Research Article
380
- 10.1016/j.compbiomed.2022.105253
- Feb 3, 2022
- Computers in Biology and Medicine
A review on multimodal medical image fusion: Compendious analysis of medical modalities, multimodal databases, fusion techniques and quality metrics
- Research Article
32
- 10.1016/j.jksuci.2023.101733
- Aug 29, 2023
- Journal of King Saud University - Computer and Information Sciences
Multimodal medical image fusion towards future research: A review
- Book Chapter
4
- 10.1016/b978-0-44-313233-9.00017-5
- Jan 1, 2024
- Data Fusion Techniques and Applications for Smart Healthcare
Chapter 11 - Deep learning-based multimodal medical image fusion
- Research Article
12
- 10.3233/xst-210851
- Mar 29, 2021
- Journal of X-Ray Science and Technology: Clinical Applications of Diagnosis and Therapeutics
Multi-modal image fusion techniques aid the medical experts in better disease diagnosis by providing adequate complementary information from multi-modal medical images. These techniques enhance the effectiveness of medical disorder analysis and classification of results. This study aims at proposing a novel technique using deep learning for the fusion of multi-modal medical images. The modified 2D Adaptive Bilateral Filters (M-2D-ABF) algorithm is used in the image pre-processing for filtering various types of noises. The contrast and brightness are improved by applying the proposed Energy-based CLAHE algorithm in order to preserve the high energy regions of the multimodal images. Images from two different modalities are first registered using mutual information and then registered images are fused to form a single image. In the proposed fusion scheme, images are fused using Siamese Neural Network and Entropy (SNNE)-based image fusion algorithm. Particularly, the medical images are fused by using Siamese convolutional neural network structure and the entropy of the images. Fusion is done on the basis of score of the SoftMax layer and the entropy of the image. The fused image is segmented using Fast Fuzzy C Means Clustering Algorithm (FFCMC) and Otsu Thresholding. Finally, various features are extracted from the segmented regions. Using the extracted features, classification is done using Logistic Regression classifier. Evaluation is performed using publicly available benchmark dataset. Experimental results using various pairs of multi-modal medical images reveal that the proposed multi-modal image fusion and classification techniques compete the existing state-of-the-art techniques reported in the literature.
- Research Article
6
- 10.1142/s0219467823400053
- Aug 22, 2022
- International Journal of Image and Graphics
Medical imaging fusion is the process of combining pictures from various imaging modalities to create a single image that may be used in clinical settings. Robust methods for merging image data from several modalities are being developed in the field of multimodal medical imaging. Deep learning (DL) has been widely researched in two areas: pattern recognition and image processing. We will demonstrate a multimodal image fusion with DL implementation that considers the characteristics of medical diagnostic imaging as well as the demands of clinical practice. For the past three years, pixel-level picture fusion has been a hot topic. This paper proposes a new multimodal medical picture fusion technique for a wide range of medical diagnostic challenges. Image fusion is crucial in biomedical research and clinical diagnostics for biomedical image processing and therapy planning. The most convincing argument for fusion is obtaining a significant amount of critical information from the input photographs. We show how a well-organized multimodal medical image fusion technique can be utilized to integrate computed tomography (CT) and magnetic resonance imaging (MRI) data in this study. Using convolutional neural networks (CNNs), the quantum-behaved particle swarm optimization (QPSO) algorithm was used to create a method for integrating multimodal medical pictures. In order to improve the overall quality and efficiency of QPSO, it was chosen to add the metrics of image entropy, standard deviation, average gradient (AG), spatial frequency (SF), and visual information fidelity (VIF). In experiments, multimodal medical images are utilized to evaluate a variety of parameters, including performance and algorithm stability. When compared to the other possibilities, the recommended technique outperformed them in the evaluations. On a range of quantitative metrics, this method outperforms the alternatives.
- Research Article
159
- 10.1016/j.ins.2021.04.052
- Apr 20, 2021
- Information Sciences
Multimodal medical image fusion based on joint bilateral filter and local gradient energy
- Research Article
- 10.3389/fimag.2026.1752625
- Feb 26, 2026
- Frontiers in Imaging
The purpose of this research is to develop a multimodal medical image fusion method that will provide high-performance fusion images at a speed high enough for efficient real-time image-guided surgeries. This paper therefore proposes an improved multi-objective Darwinian particle swarm optimization method that incorporates a fractional calculus operator for effective multimodal medical image fusion. This is because multimodal medical image fusion is essential in many clinical diagnoses, and it represents a multi-objective problem due to the important objective indicators for measuring its efficiencies, such as the parameters of the neural network and the speed of the fusion process. The proposed method aims to optimize the Tsallis cross-entropy as a stimulating input to the pulse-coupled neural network (PCNN) for multimodal image fusion. In this work, multi-objective Darwinian particle swarm optimization (MODPSO) is utilized due to its ability to escape local optima more effectively than classical multi-objective particle swarm optimization (MOPSO). The approach uses the fact that the convergence rate of MODPSO is improved by introducing a fractional calculus operator, which is incorporated into the updating formulas for the velocity and position of the particles. The PCNN output serves as an optimal parameter for fusing the high-frequency coefficients of decomposed source images, which are initially decomposed into low- and high-frequency subbands. The low-frequency coefficients are fused using an averaging method. Results obtained in this paper show that the proposed method yields the highest average accuracy of 90.7% after a three-fold cross-validation was carried out with a small dataset extracted from a larger available dataset. In conclusion, the experimental results demonstrate the superiority of the proposed method over comparative methods in terms of both visual quality and quantitative evaluation.
- Research Article
159
- 10.1016/j.bspc.2021.102480
- Feb 19, 2021
- Biomedical Signal Processing and Control
An image quality enhancement scheme employing adolescent identity search algorithm in the NSST domain for multimodal medical image fusion
- Research Article
5
- 10.3390/s24113545
- May 30, 2024
- Sensors (Basel, Switzerland)
Multi-modal medical image fusion (MMIF) is crucial for disease diagnosis and treatment because the images reconstructed from signals collected by different sensors can provide complementary information. In recent years, deep learning (DL) based methods have been widely used in MMIF. However, these methods often adopt a serial fusion strategy without feature decomposition, causing error accumulation and confusion of characteristics across different scales. To address these issues, we have proposed the Coupled Image Reconstruction and Fusion (CIRF) strategy. Our method parallels the image fusion and reconstruction branches which are linked by a common encoder. Firstly, CIRF uses the lightweight encoder to extract base and detail features, respectively, through the Vision Transformer (ViT) and the Convolutional Neural Network (CNN) branches, where the two branches interact to supplement information. Then, two types of features are fused separately via different blocks and finally decoded into fusion results. In the loss function, both the supervised loss from the reconstruction branch and the unsupervised loss from the fusion branch are included. As a whole, CIRF increases its expressivity by adding multi-task learning and feature decomposition. Additionally, we have also explored the impact of image masking on the network's feature extraction ability and validated the generalization capability of the model. Through experiments on three datasets, it has been demonstrated both subjectively and objectively, that the images fused by CIRF exhibit appropriate brightness and smooth edge transition with more competitive evaluation metrics than those fused by several other traditional and DL-based methods.
- Book Chapter
3
- 10.1016/b978-0-44-313233-9.00010-2
- Jan 1, 2024
- Data Fusion Techniques and Applications for Smart Healthcare
Chapter 4 - Robust watermarking algorithm based on multimodal medical image fusion
- Research Article
37
- 10.1016/j.bspc.2021.103214
- Oct 13, 2021
- Biomedical Signal Processing and Control
Multimodal image fusion and denoising in NSCT domain using CNN and FOTGV
- Book Chapter
2
- 10.1007/978-3-030-64559-5_13
- Jan 1, 2020
Multi-modal medical image fusion plays a significant role in clinical applications like noninvasive diagnosis and image-guided surgery. However, designing an efficient image fusion technique is still a challenging task. In this paper, we propose an improved multi-modal medical image fusion method to enhance the visual quality and contrast of the fused image. To achieve this work, the registered source images are firstly decomposed into low-frequency (LF) and several high-frequency (HF) sub-images via non-subsampled shearlet transform (NSST). Afterward, LF sub-images are combined using the proposed weight local features fusion rule based on local energy and standard deviation, while HF sub-images are fused based on the novel sum-modified-laplacien (NSML) technique. Finally, inversed NSST is applied to reconstruct the fused image. Furthermore, the proposed method is extended to color multi-modal image fusion that effectively restrains color distortion and enhances spatial and spectral resolutions. To evaluate the performance, various experiments conducted on different datasets of gray-scale and color images. Experimental results show that the proposed scheme achieves better performance than other state-of-art proposed algorithms in both visual effects and objective criteria.
- Research Article
- 10.55463/issn.1674-2974.49.3.13
- Mar 28, 2022
- Journal of Hunan University Natural Sciences
The fusion of multimodal images is a trending research area, especially in the field of medical image processing. The purpose of image fusion is to classify medical images efficiently. The objective of the research work is to do the fusion of multimodal medical images for doing medical image classification. In this research, a new algorithm is proposed for the detection of brain tumors based on three main steps namely, fusion, segmentation, and classification. A sparse theory-based vector selection (STVS) algorithm is proposed for image fusion. In this algorithm, the multimodal images are first converted into patches. These patches are further vectorized. The vectorized patches are employed in the creation of dictionaries. The generated dictionaries along with the vectorized patches are used for the creation of sparse matrices. From the sparse matrices, a selection vector is formed using which the fused image is generated. The segmentation of the fused image is done using Intuitionistic fuzzy set-based k-means (IFSKM) clustering and the Otsu thresholding technique. The clusters of the IFSKM are generated based on the Intuitionistic fuzzy set (IFS) scheme. Finally, classification is performed based on a DCNN architecture. The proposed system is validated using the brain images from the Harvard Medical School. Quantitative analysis reveals that the proposed scheme achieves the best performance in terms of fusion, segmentation, and classification. The proposed STVS scheme attained high values of entropy, standard deviation, PSNR in dB, mean square error (MSE), structural similarity index (SSIM), and homogeneity with the values of 7.33, 55.25, 42.85, 0.098, 64.31, and 53.52 respectively.
- Research Article
1
- 10.1166/jmihi.2021.3763
- Aug 1, 2021
- Journal of Medical Imaging and Health Informatics
Multimodal medical imaging is an indispensable requirement in the treatment of various pathologies to accelerate care. Rather than discrete images, a composite image combining complementary features from multimodal images is highly informative for clinical examinations, surgical planning, and progress monitoring. In this paper, a deep learning fusion model is proposed for the fusion of medical multimodal images. Based on pyramidal and residual learning units, the proposed model, strengthened with adaptive fusion rules, is tested on image pairs from a standard dataset. The potential of the proposed model for enhanced image exams is shown by fusion studies with deep network images and quantitative output metrics of magnetic resonance imaging and positron emission tomography (MRI/PET) and magnetic resonance imaging and single-photon emission computed tomography (MRI/SPECT). The proposed fusion model achieves the Structural Similarity Index Measure (SSIM) values of 0.9502 and 0.8103 for the MRI/SPECT and MRI/PET MRI/SPECT image sets, signifying the perceptual visual consistency of the fused images. Testing is performed on 20 pairs of MRI/SPECT and MRI/PET images. Similarly, the Mutual Information (MI) values of 2.7455 and 2.7776 obtained for the MRI/SPECT and MRI/PET image sets, indicating the model’s ability to capture the information content from the source images to the composite image. Further, the proposed model allows deploying its variants, introducing refinements on the basic model suitable for the fusion of low and high-resolution medical images of diverse modalities.