Abstract

By exploiting the complementary information of RGB modality and thermal modality, RGB-thermal (RGB-T) semantic segmentation is robust to adverse lighting conditions. When fusing features from RGB images and thermal images, the existing methods design different feature fusion strategies, but most of these methods overlook the modality differences caused by different imaging mechanisms. This may result in insufficient usage of complementary information. To address this issue, we propose a novel Mask-guided Modality Difference Reduction Network (MMDRNet), where the mask is utilized in the image reconstruction to ensure that the modality discrepancy within foreground regions is minimized. Doing so enables the generation of more discriminative representations for foreground pixels, thus facilitating the segmentation task. On top of this, we present a Dynamic Task Balance (DTB) method to balance the modality difference reduction task and semantic segmentation task dynamically. The experimental results on the MFNet dataset and the PST900 dataset demonstrate the superiority of the proposed mask-guided modality difference reduction strategy and the effectiveness of the DTB method.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call