Abstract
Infrared-visible image fusion aims to merge data for a more accurate scene representation. Current research focuses mainly on enhancing visual appeal rather than improving performance in high-level vision tasks. To address this gap, we propose the Semantic Enhanced Multi-scale Cross-modality Interactive Image Fusion Network (SeMIFusion). Initially, Multi-scale Cross-modality Feature Fusion (MCFF) module is devised to extract shallow and deep features across different modalities. During feature extraction, Texture Enhancer (TE) and Semantify Enhancer (SE) blocks capture diverse hierarchical features across multi-scale layers, seamlessly integrating into the Semantic Feature Integration (SFI) module for profound semantic information extraction. Furthermore, an Image Scene Reconstruction (ISR) module maintains original image details in fused features, ensuring image fidelity. Additionally, incorporating a visual preservation guiding mask prioritizes retaining visual quality during reconstruction, preventing degradation. Extensive experiments demonstrate our method’s superiority in preserving visual effects and texture details, especially in high-level vision tasks. The source code is available at https://github.com/yyzzttkkjj/SeMIFusion.
Published Version
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have