Abstract

Multi-Focus Image Fusion (MFIF) is an image enhancement task that fuses images in which different regions are in focus to achieve an all-in-focus image. In recent years, Generative Adversarial Networks (GANs)-based approaches have significantly improved the MFIF on Convolutional Neural Network (CNN) architectures. However, despite vision transformers (ViTs) achieving more successful results than CNNs in many high and low-level vision problems due to their ability to provide global connectivity, they have not been employed for MFIF until this study. We develop a Multi-image Transformer (MiT) for MFIF by being inspired by a Spatial-Temporal Transformer Network (STTN) so that global connection can be modeled along multiple input images. We call the proposed transformers-based MFIF model MiT-MFIF as it uses the developed MiT as a core component. We have made various modifications to the baseline transformer to be able to utilize vision transformers in MFIF tasks. Comprehensive experiments on standard MFIF datasets demonstrate the effectiveness of the proposed MiT-MFIF. Moreover, proposed method does not require any post-processing step like in competitor GAN-based methods while outperforming the state-of-the-art MFIF methods.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.