Multimodal medical image fusion methods, which combine complementary information from many multi-modality medical images, are among the most important and practical approaches in numerous clinical applications. Various conventional image fusion techniques have been developed for multimodality image fusion. Complex procedures for weight map computing, fixed fusion strategy and lack of contextual understanding remain difficult in conventional and machine learning approaches, usually resulting in artefacts that degrade the image quality. This work proposes an efficient hybrid learning model for medical image fusion using pre-trained and non-pre-trained networks i.e. VGG-19 and SNN with stacking ensemble method. The model leveraging the unique capabilities of each architecture, can effectively preserve the detailed information with high visual quality, for numerous combinations of image modalities in image fusion challenges, notably improved contrast, increased resolution, and lower artefacts. Additionally, this ensemble model can be more robust in the fusion of various combinations of source images that are publicly available from Havard-Medical-Image-Fusion Datasets, GitHub. and Kaggle. Our proposed model performance is superior in terms of visual quality and performance metrics to that of the existing fusion methods in literature like PCA+DTCWT, NSCT, DWT, DTCWT+NSCT, GADCT, CNN and VGG-19.