Multispectral object detection for autonomous driving has gotten significant attention in recent years. Since the complementing visual information provided by multispectral (i.e., RGB and thermal) could produce a more reliable detection result, which is crucial for the safety of autonomous driving. The previous studies have mainly focused on leveraging the illumination information in the decision-level fusion and suffered from generating reasonable illumination weights. In this paper, we propose IGT, an illumination-guided RGB-T object detection framework based on transformers, to effectively generate appropriate illumination weights and use them to guide multispectral feature rectification and fusion processes. Specifically, an illumination-based sub-network (IBSN) is first proposed to produce reasonable illumination weights from the mid-level features of the RGB stream. Then, we design an illumination-guided feature rectification module (IFRM) to determine the utilization level of the thermal modality’s illumination invariance according to the illumination intensity of the RGB image during the rectification process. With the rectified feature pairs, we deploy the illumination-guided feature fusion module (IFFM), which uses a novel illumination-based loss function to promote multispectral feature fusion and generate illumination-invariant fused features. Extensive experiments reveal that IGT achieves state-of-the-art performance on the FLIR dataset. Detailed analyses show that our network is robust under varying illumination conditions.
Read full abstract