ABSTRACTThe goal of fusing medical images is to integrate the diverse information that multimodal medical images hold. However, the challenges lie in the limitations of imaging sensors and the issue of incomplete modal information retention, which make it difficult to produce images encompassing both functional and anatomical information. To overcome these obstacles, several medical image fusion techniques based on CNN or transformer architectures have been presented. Nevertheless, CNN technique struggles to establish extensive dependencies between the fused and source images, and transformer architecture often overlooks shallow complementary features. To augment both the feature extraction capacity and the stability of the model, we introduce a framework, called dual‐branch complementary feature injection fusion (CFIFusion) technique, a for multimodal medical image fusion framework that combines unsupervised models of CNN model and transformer techniques. Specifically, in our framework, the entire source image and segmented source image are input into an adaptive backbone network to learn global and local features, respectively. To further retain the source images' complementary information, we design a multi‐scale complementary feature extraction framework as an auxiliary module, focusing on calculating feature differences at each level to capture the shallow complementary information. Then, we design a shallow information preservation module tailored for sliced image characteristics. Experimental results on the Harvard whole brain atlas dataset demonstrate that CFIFusion shows greater benefits than recent state‐of‐the‐art algorithms in terms of both subjective and objective evaluations.
Read full abstract