Multi-Modal Medical Image Fusion for Non-Small Cell Lung Cancer Classification
The early detection and nuanced subtype classification of non-small cell lung cancer (NSCLC), a predominant cause of cancer mortality worldwide, is a critical and complex issue. In this paper, we introduce an innovative integration of multi-modal data, synthesizing fused medical imaging (CT and PET scans) with clinical health records and genomic data. This unique fusion methodology leverages advanced machine learning models, notably MedClip and BEiT, for sophisticated image feature extraction, setting a new standard in computational oncology. Our research surpasses existing approaches, as evidenced by a substantial enhancement in NSCLC detection and classification precision. The results showcase notable improvements across key performance metrics, including accuracy, precision, recall, and F1-score. Specifically, our leading multi-modal classifier model records an impressive accuracy of $94.04 \%$. We believe that our approach has the potential to transform NSCLC diagnostics, facilitating earlier detection and more effective treatment planning and, ultimately, leading to superior patient outcomes in lung cancer care.
- Research Article
- 10.61189/617079irudnn
- Dec 31, 2025
- Perioperative Precision Medicine
Multimodal medical image fusion technology optimizes image content by integrating images from diverse modalities, such as Computed Tomography (CT), Positron Emission Tomography (PET), Magnetic Resonance Imaging (MRI), and Single Photon Emission Computed Tomography (SPECT), while retaining critical information. With the rapid advancements in medical imaging technology, single-modal approaches have limitations in capturing comprehensive anatomical or functional characteristics. As a result, researchers are increasingly turning to multimodal fusion methods to enhance diagnostic accuracy and provide richer data for classification, detection, and segmentation tasks. In particular, during the perioperative period, multimodal image fusion plays a crucial role in surgical planning, intraoperative navigation, and postoperative evaluation, enabling precise localization of lesions and improving clinical decision-making. This paper presents a survey of the latest literature on medical image fusion, covering three major approaches: traditional methods, model-based methods, and learning-based methods. It discusses the advantages and limitations of each approach, with a particular emphasis on traditional image processing techniques, model-based fusion methods, and the integration of emerging deep learning (DL) technologies. Comparative experimental analysis highlights performance differences among these methods in terms of information retention, computational efficiency, and clinical applicability. Finally, the paper reviews performance evaluation metrics for multimodal fusion and provides recommendations for future research to further promote the widespread adoption of this technology in clinical diagnostics and intelligent healthcare.
- Research Article
- 10.1016/j.cmpb.2025.108612
- Apr 1, 2025
- Computer methods and programs in biomedicine
M2OCNN: Many-to-One Collaboration Neural Networks for simultaneously multi-modal medical image synthesis and fusion.
- Research Article
12
- 10.3233/xst-210851
- Mar 29, 2021
- Journal of X-Ray Science and Technology: Clinical Applications of Diagnosis and Therapeutics
Deep learning supported disease detection with multi-modality image fusion.
- Research Article
152
- 10.1016/j.ins.2021.04.052
- Apr 20, 2021
- Information Sciences
Multimodal medical image fusion based on joint bilateral filter and local gradient energy
- Book Chapter
- 10.1007/978-981-19-5936-3_78
- Jan 1, 2023
The significant advancement in medical imaging methodology is a multimodal medical image fusion method. Its primary goal is to provide a complete overview of medical image fusion methods, including theoretical foundations and recent breakthroughs. The primary goal is to gather useful information by merging several images obtained from different sources into a single image appropriate for better diagnosis. Magnetic Resonance Imaging (MRI) aids professionals’ decision making during the aided diagnostic pipeline in medical techniques such as Computed Tomography (CT). The fused image, on the other hand, may aid in the performance of other tasks such as classification, detection, and segmentation. The suggested technique eliminates distortion from the source image in the first step, followed by image enhancement, weight extraction, weight map computation and refining, pyramid decomposition, and output fused image. Matlab programming tool is utilized for this procedure, and the authors can customize the parameter settings. In this experiment, we focus on qualitative rather than quantitative analyses. The method’s conclusion is that current multimodal medical image fusion study findings are more relevant and might be used to successfully diagnose patients.
- Research Article
156
- 10.1016/j.compbiomed.2020.103823
- Jun 20, 2020
- Computers in Biology and Medicine
Multi-modal medical image fusion by Laplacian pyramid and adaptive sparse representation
- Research Article
- 10.1038/s41598-025-22834-1
- Nov 6, 2025
- Scientific Reports
Multimodal medical image fusion plays an important role in clinical applications. However, multimodal medical image fusion methods ignore the feature dependence among modals, and the feature fusion ability with different granularity is not strong. A Long-Range Correlation-Guided Dual-Encoder Fusion Network for Medical Images is proposed in this paper. The main innovations of this paper are as follows: Firstly, A Cross-dimension Multi-scale Feature Extraction Module (CMFEM) is designed in the encoder, by extracting multi-scale features and aggregating coarse-to-fine features, the model realizes fine-grained feature enhancement in different modalities. Secondly, a Long-range Correlation Fusion Module (LCFM) is designed, by calculating the long-range correlation coefficient between local features and global features, the same granularity features are fused by the long-range correlation fusion module. long-range dependencies between modalities are captured by the model, and different granularity features are aggregated. Finally, this paper is validated on clinical multimodal lung medical image dataset and brain medical data dataset. On the lung medical image dataset, IE, AG, {textbf {Q}}^{{textbf {AB/F}}}, and EI metrics are improved by 4.53%, 4.10%, 6.19%, and 6.62% respectively. On the brain medical image dataset, SF, VIF, and {textbf {Q}}^{{textbf {AB/F}}} metrics are improved by 3.88%, 15.71%, and 7.99% respectively. This model realizes better fusion performance, which plays an important role in the fusion of multimodal medical images.
- Research Article
198
- 10.1109/jsen.2016.2533864
- May 1, 2016
- IEEE Sensors Journal
Multimodal medical image fusion plays a vital role in different clinical imaging sensor applications. This paper presents a novel multimodal medical image fusion method that adopts a multiscale geometric analysis of the nonsubsampled contourlet transform (NSCT) with type-2 fuzzy logic techniques. First, the NSCT was performed on preregistered source images to obtain their high- and low-frequency subbands. Next, an effective type-2 fuzzy logic-based fused rule is proposed for fusion of the high-frequency subbands. In the presented fusion approach, the local type-2 fuzzy entropy is introduced to automatically select high-frequency coefficients. However, for the low-frequency subbands, they were fused by a local energy algorithm based on the corresponding image’s local features. Finally, the fused image was constructed by the inverse NSCT with all composite subbands. Both subjective and objective evaluations showed better contrast, accuracy, and versatility in the proposed approach compared with state-of-the-art methods. Besides, an effective color medical image fusion scheme is also given in this paper that can inhibit color distortion to a large extent and produce an improved visual effect.
- Research Article
7
- 10.3389/fgene.2022.927222
- Jun 23, 2022
- Frontiers in Genetics
Multi-modal medical image fusion can reduce information redundancy, increase the understandability of images and provide medical staff with more detailed pathological information. However, most of traditional methods usually treat the channels of multi-modal medical images as three independent grayscale images which ignore the correlation between the color channels and lead to color distortion, attenuation and other bad effects in the reconstructed image. In this paper, we propose a multi-modal medical image fusion algorithm with geometric algebra based sparse representation (GA-SR). Firstly, the multi-modal medical image is represented as a multi-vector, and the GA-SR model is introduced for multi-modal medical image fusion to avoid losing the correlation of channels. Secondly, the orthogonal matching pursuit algorithm based on geometric algebra (GAOMP) is introduced to obtain the sparse coefficient matrix. The K-means clustering singular value decomposition algorithm based on geometric algebra (K-GASVD) is introduced to obtain the geometric algebra dictionary, and update the sparse coefficient matrix and dictionary. Finally, we obtain the fused image by linear combination of the geometric algebra dictionary and the coefficient matrix. The experimental results demonstrate that the proposed algorithm outperforms existing methods in subjective and objective quality evaluation, and shows its effectiveness for multi-modal medical image fusion.
- Book Chapter
4
- 10.1016/b978-0-44-313233-9.00017-5
- Jan 1, 2024
- Data Fusion Techniques and Applications for Smart Healthcare
Chapter 11 - Deep learning-based multimodal medical image fusion
- Research Article
20
- 10.1155/2022/6878783
- Apr 14, 2022
- Behavioural Neurology
Multimodal medical image fusion is a current technique applied in the applications related to medical field to combine images from the same modality or different modalities to improve the visual content of the image to perform further operations like image segmentation. Biomedical research and medical image analysis highly demand medical image fusion to perform higher level of medical analysis. Multimodal medical fusion assists medical practitioners to visualize the internal organs and tissues. Multimodal medical fusion of brain image helps to medical practitioners to simultaneously visualize hard portion like skull and soft portion like tissue. Brain tumor segmentation can be accurately performed by utilizing the image obtained after multimodal medical image fusion. The area of the tumor can be accurately located with the information obtained from both Positron Emission Tomography and Magnetic Resonance Image in a single fused image. This approach increases the accuracy in diagnosing the tumor and reduces the time consumed in diagnosing and locating the tumor. The functional information of the brain is available in the Positron Emission Tomography while the anatomy of the brain tissue is available in the Magnetic Resonance Image. Thus, the spatial characteristics and functional information can be obtained from a single image using a robust multimodal medical image fusion model. The proposed approach uses a generative adversarial network to fuse Positron Emission Tomography and Magnetic Resonance Image into a single image. The results obtained from the proposed approach can be used for further medical analysis to locate the tumor and plan for further surgical procedures. The performance of the GAN based model is evaluated using two metrics, namely, structural similarity index and mutual information. The proposed approach achieved a structural similarity index of 0.8551 and a mutual information of 2.8059.
- Research Article
36
- 10.1016/j.bspc.2021.102697
- May 5, 2021
- Biomedical Signal Processing and Control
Siamese networks and multi-scale local extrema scheme for multimodal brain medical image fusion
- Research Article
8
- 10.2174/0118750362370697250630063814
- Jul 4, 2025
- The Open Bioinformatics Journal
Introduction Medical image fusion combines the data obtained from different imaging modalities such as Computed Tomography (CT), Positron Emission Tomography (PET), and Magnetic Resonance Imaging (MRI) into a single, informative image that aids clinicians in diagnosis and treatment planning. No single imaging modality can provide complete information on its own. This has led to the emergence of a research field focused on integrating data from multiple modalities to maximize information in a single, unified representation. Methods CNN (Convolutional Neural Network) was applied to achieve robust and effective multi-modal image fusion. By delving into the principles and practical applications of this deep learning approach, the paper also provides a comparative analysis of CNN-based results with other conventional fusion techniques. Results CNN-based image fusion delivers far better results in terms of qualitative and quantitative analysis when compared with other conventional fusion methods. The paper also discusses future perspectives, emphasizing advancements in deep learning that could drive the evolution of CNN-based fusion and enhance its effectiveness in medical imaging. Discussion CNN-based multi-modal medical image fusion proves strong advantages over traditional methods in terms of feature preservation and adaptability. However, challenges such as data dependency, computational complexity, and generalization across modalities persist. Emerging trends like attention mechanisms and transformer models show promise in addressing these gaps. Future work should focus on improving interpretability and clinical applicability, ensuring that deep learning fusion methods can be reliably integrated into real-world diagnostic systems. Conclusion Ultimately, this work underscores the potential of CNN-based fusion to improve patient outcomes and shape the future of medical imaging by advancing the understanding of multi-modal fusion.
- Conference Article
3
- 10.1145/2425333.2425405
- Dec 16, 2012
Medical image fusion needs proper attention as images obtained from medical instruments are of poor contrast and corrupted by blur and noise due to imperfection of image capturing devices. Thus, objective evaluation of medical image fusion techniques has become an important task in noisy domain. Therefore, in the present work, we have proposed maximum selection and energy based fusion rules for the evaluation of noisy multimodal medical image fusion using Daubechies complex wavelet transform (DCxWT). Unlike, traditional real valued wavelet transforms, which suffered from shift sensitivity and did not provide any phase information, DCxWT is shift invariant and provides phase information through its imaginary coefficients. Shift invariance and availability of phase information properties of DCxWT have been found useful for fusion of multimodal medical images. The experiments have been performed over several set of noisy medical images at multiple levels of noise for the proposed fusion scheme. Further, the proposed fusion scheme has been tested up to the maximum level of Gaussian, salt & pepper and speckle noise. Objective evaluation of the proposed fusion scheme is performed with fusion factor, fusion symmetry, entropy, standard deviation and edge information metrics. Results have been shown for two sets of multimodal medical images for the proposed method with maximum and energy based fusion rules, and comparison has been done with Lifting wavelet transform (LWT) and Stationary wavelet transform (SWT) based fusion methods. Comparative analysis of the proposed method with LWT and SWT based fusion methods at different noise levels shows the superiority of the proposed scheme. Moreover, the plots of different fusion metrics against the maximum level of Gaussian, salt & pepper and speckle noise show the robustness of the proposed fusion method against noise.
- Conference Article
- 10.1117/12.2579844
- Nov 10, 2020
Multimodal medical image fusion is to extract information from different modal images into a single one and obtain the organizational characteristics of source images. Different medical imaging measures such as CT and MRI often bring different visual morphology, however, the salient features of tissues are basically the same from the perspective of human eyes. According to this characteristic, an improved image fusion algorithm based on visual salience detection is proposed in this paper. First, the GBVS algorithm was introduced to calculate visual salience of two registered source images, and then decompose the source images in NSST domain to obtain their low-frequency and high-frequency sub-bands. For the low-frequency sub-bands, local energy and GBVS graph are input into fuzzy logic system to obtain the respective weights for the fused low-frequency sub-band. For the high-frequency sub-bands, the NSML values of each sub-band were calculated and compared to obtain the fused high-frequency sub-band. The final fused image was obtained by using the inverse NSST transformation. Applying this method to multimodal medical image fusion, the visual quality of the image can be enhanced effectively and the salient features of tissues can be preserved well. Experiments on multimodal fusion of different gray-scale medical images show that the proposed method has advantages in retention of image salient features and the overall image contrast, and has better objective index than the comparison models.