Abstract

In the realm of infrared and visible scene parsing, satisfactory performance has been achieved by leveraging the complementary nature of infrared and visible imaging modalities. Existing methods have employed various strategies to fuse cross-modality features. However, these strategies typically integrate features at the same levels (the depths of network) neglecting the potential interactions across different levels. To address this limitation, we introduce a novel concept called misalignment fusion, which involves merging multimodality feature maps at distinct levels. In line with this concept, we propose a misalignment fusion network (MFNet) specifically designed for the task of infrared and visible urban scene parsing. Our network incorporates a misalignment-guided fusion module to integrate cross-modality features, as well as an adaptive refined selective fusion module to combine the predicted segmentation maps obtained from two parallel-branch decoders. Numerous experiments have been conducted to evaluate the performance of the proposed MFNet. The results consistently demonstrate that our approach surpasses existing state-of-the-art methods in the field of infrared and visible urban scene parsing.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call