As an important carrier of data, images contain a huge amount of information. The purpose of image fusion is to integrate the information from source images into a single image. Since the source images are from the same scene, there is much redundant information between them. Common fusion methods do not filter this information and the fusion process is often disturbed. This leads to degradation in the quality of the reconstructed fused image. To solve this problem, this paper explores the strategies of information filtering and fusion control, and proposes a universal image fusion method based on the mask attention mechanism. It can be divided into pre-training stage and formal fusion stage. In the pre-training stage, coarse-grained mask maps are generated which are employed to improve the mask autoencoder and the mask attention mechanism. In the image formal fusion stage, with the help of coarse-grained mask maps, the mask autoencoder changes the process of random masking and discards redundant features between source images. Meanwhile the mask attention mechanism focuses on the distinctions between various source images and retains effective complementary information. Qualitative and quantitative extension experiments on different modal datasets validate the applicability of the model in multi-focus image fusion, infrared and visible image fusion, and medical image fusion. Our method achieves excellent performance in all these tasks and performs better than existing fusion methods. Our code is publicly available at https://github.com/xiangxiang-wang/MMAE.
Read full abstract