Abstract

Although extensive researches have carried out on visible and infrared images fusion, quality assessment of the fused image is still challenging due to the absence of the reference image. In this paper, a subjective benchmark dataset and a semi-reference objective assessment method based on a Transformer encoder–decoder framework named Semantic-Relation Transformer (SRT), are developed for Visible and Infrared Fused Image Quality Assessment (VIF-IQA). Different with existing Transformers, SRT decoder can extract multi-level source image features and adopts a Multi-Head Self-Evaluation (MHSE) block which is constructed to mine latent relation knowledge between the fused image and source images. The relation knowledge is then injected into 3D-token with deep semantic embedding of receptive regions. Finally, the objective assessment score is obtained from-token through linear mapping of local to global. Moreover, we meticulously select 4,000 fused images from 200 scenes in TNO, MSRS, M3FD and Road Scene datasets and create a Visible and Infrared fuSed qualiTy Assessment (VISTA) dataset, which is guided by Subjective VIF-IQA Specification rigorously. VISTA dataset is utilized to comprehensively validate the proposed SRT. The experimental results demonstrated that the output of SRT is more in keeping with subjective feelings. Moreover, SRT has state-of-the-art performance on quantitative metrics when compared with 12 popular methods and the output of SRT is more in keeping with subjective feelings. VISTA dataset is available at https://github.com/ChangeZH/VISTA-Dataset.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call