Abstract
In the realm of infrared and visible scene parsing, satisfactory performance has been achieved by leveraging the complementary nature of infrared and visible imaging modalities. Existing methods have employed various strategies to fuse cross-modality features. However, these strategies typically integrate features at the same levels (the depths of network) neglecting the potential interactions across different levels. To address this limitation, we introduce a novel concept called misalignment fusion, which involves merging multimodality feature maps at distinct levels. In line with this concept, we propose a misalignment fusion network (MFNet) specifically designed for the task of infrared and visible urban scene parsing. Our network incorporates a misalignment-guided fusion module to integrate cross-modality features, as well as an adaptive refined selective fusion module to combine the predicted segmentation maps obtained from two parallel-branch decoders. Numerous experiments have been conducted to evaluate the performance of the proposed MFNet. The results consistently demonstrate that our approach surpasses existing state-of-the-art methods in the field of infrared and visible urban scene parsing.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.