Abstract

In recent times, several medical image fusion techniques based on the convolutional neural network (CNN) have been proposed for various medical imaging fusion tasks. However, these methods cannot model the long-range dependencies between the fused image and the source images. To address this limitation, we propose DFENet, a multimodal medical image fusion framework that integrates CNN feature learning and vision transformer feature learning using self-supervised learning. DFENet is based on an encoder-decoder network, which can be trained on large-scale natural image dataset without the need for carefully collated ground truth fusion images. The proposed network consists of an encoder, a feature fuser, and a decoder. The encoder is composed of a CNN module and a transformer module, which is used to extract local and global features of images. In order to avoid the use of simple up-sampling and concatenate processing, a new global semantic information aggregation module is proposed to efficiently aggregate the multi-scale features obtained by the transformer module, which enhances the quality of the reconstructed images. The decoder is composed of six convolution layers with two skip connections, which are used for the reconstruction from fused features. We also propose a fusion strategy combining local energy and gradient information for the feature fusion process of magnetic resonance imaging and functional medical images. Compared to conventional fusion rules, our fusion strategy is more robust to noisy images. And compared with the existing competitive methods, our method retains more texture details of the original images and outputs a more natural and realistic fused image.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call