Multi-image transformer for multi-focus image fusion

Levent Karacan

doi:10.1016/j.image.2023.117058

Abstract

Multi-Focus Image Fusion (MFIF) is an image enhancement task that fuses images in which different regions are in focus to achieve an all-in-focus image. In recent years, Generative Adversarial Networks (GANs)-based approaches have significantly improved the MFIF on Convolutional Neural Network (CNN) architectures. However, despite vision transformers (ViTs) achieving more successful results than CNNs in many high and low-level vision problems due to their ability to provide global connectivity, they have not been employed for MFIF until this study. We develop a Multi-image Transformer (MiT) for MFIF by being inspired by a Spatial-Temporal Transformer Network (STTN) so that global connection can be modeled along multiple input images. We call the proposed transformers-based MFIF model MiT-MFIF as it uses the developed MiT as a core component. We have made various modifications to the baseline transformer to be able to utilize vision transformers in MFIF tasks. Comprehensive experiments on standard MFIF datasets demonstrate the effectiveness of the proposed MiT-MFIF. Moreover, proposed method does not require any post-processing step like in competitor GAN-based methods while outperforming the state-of-the-art MFIF methods.

Full Text