The task of image fusion for optical images and SAR images is to integrate valuable information from source images. Recently, owing to powerful generation, diffusion models, e.g., diffusion denoising probabilistic model and score-based diffusion model, are flourished in image processing, and there are some effective attempts in image fusion by scholars’ progressive explorations. However, the diffusion models for image fusion suffer from inevitable SAR speckle that seriously shelters from effective information in the same location of optical image. Besides, these methods integrate pixel-level features without information for high-level tasks, e.g., target detection and image classification, which leads fused images are insufficient and their application accuracies are low, for high-level tasks. To tackle these hurdles, we propose the semantic information guided diffusion posterior sampling for image fusion. Firstly, we employ the SAR-BM3D as preprocessing to despeckle. Then, the sampling model is established with fidelity, regularization and semantic information guidance term. The first two terms are obtained by the variational diffusion method via variational inference and first-order stochastic optimization. The last term is served by cross entropy loss between annotation and classification result from FLCNet we design. Finally, the experiments validate the feasibility and superiority of the proposed method on WHU-OPT-SAR dataset and DDHRNet dataset.
Read full abstract