Multimodal medical image fusion aims to integrate multisensory medical data into a single image for better prediction of the diagnostic details. This article presents a feature-level multimodal medical image fusion with the use of a two-scale ℓ <sub xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">1</sub> - ℓ <sub xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">0</sub> hybrid layer decomposition scheme to maximize the structural details with the suppression of significant noise and artifacts. The proposed method uses a convolutional neural network (CNN) with consistency verification and structural patch clustering (fuzzy c-means-based) for the decomposed base and detail layer fusion, respectively. At first, a color space transform is used to separate the luminance and chrominance components extracted from both the source images and the luminance part of each image is decomposed. In the second step, a pretrained CNN model is used to extract the prominent features from each decomposed base layer component. For the output feature map, a regional energy-based activity measure is computed and generated a fusion score, which is refined in the consistency verification step to optimize the weight map for fusing the base layers. The two-scale detail layers are merged by the clustering-based prelearned dictionary for efficiently mapping of structural details of the layers. The color components associated with both the images are also combined using a color saliency measure. Finally, the fused base layer, detail layers, and color components are merged to get the resultant fused image. The experimental results justify the superiority of the proposed approach in terms of both subjective and objective assessments. The performance of the proposed scheme is tested on a large multimodal medical data set of MR-SPECT, MR-PET, and CT-MR neurological images. Furthermore, the proposed approach shows better results than the other state-of-the-art approaches with improved fusion quality and computational performances.