Mask-guided modality difference reduction network for RGB-T semantic segmentation

Wenli Liang,Yuanjian Yang,Fangyu Li,Xi Long,Caifeng Shan

doi:10.1016/j.neucom.2022.12.036

Abstract

By exploiting the complementary information of RGB modality and thermal modality, RGB-thermal (RGB-T) semantic segmentation is robust to adverse lighting conditions. When fusing features from RGB images and thermal images, the existing methods design different feature fusion strategies, but most of these methods overlook the modality differences caused by different imaging mechanisms. This may result in insufficient usage of complementary information. To address this issue, we propose a novel Mask-guided Modality Difference Reduction Network (MMDRNet), where the mask is utilized in the image reconstruction to ensure that the modality discrepancy within foreground regions is minimized. Doing so enables the generation of more discriminative representations for foreground pixels, thus facilitating the segmentation task. On top of this, we present a Dynamic Task Balance (DTB) method to balance the modality difference reduction task and semantic segmentation task dynamically. The experimental results on the MFNet dataset and the PST900 dataset demonstrate the superiority of the proposed mask-guided modality difference reduction strategy and the effectiveness of the DTB method.

Full Text