Abstract

Integrating complementary information from different modalities is one of the key challenges in image fusion. Most of the existing deep learning-based methods still rely on a one-off fusion layer to integrate the features extracted from two modalities into one. Such an information interaction pattern only considers significant feature integration but neglects the removal of hazardous information that is widely present in the source images. To overcome these limitations, we propose a progressive token exchanging Transformer for infrared and visible image fusion, named PTET. Different from the one-time fusion layer, we devise a progressive token exchange strategy to gradually transfer features from source images and remove harmful information simultaneously. A predictor is utilized to assess the saliency of Transformer tokens from both modalities. Afterwards, an exchanger is designed to perform beneficial token transfer and insignificant token elimination. Through the cascading layers, our network enhances the feature of fusion branch in a progressive manner. Innovative exchange loss and rank loss are introduced to constrain the fusion network. Extensive experiments on MSRS and LLVIP datasets demonstrate the superiority of our PTET compared to nine state-of-the-art alternatives. Visualization of token exchanging strategy and ablation study reveals the effectiveness of our designs.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call