Modality and Camera Factors Bi-Disentanglement for NIR-VIS Object Re-Identification

Zefeng Lu,Haifeng Hu,Ronghao Lin

doi:10.1109/tifs.2023.3262130

Zefeng Lu, Haifeng Hu + Show 1 more

https://doi.org/10.1109/tifs.2023.3262130

Copy DOI

Export

Save

Cite

Abstract
Full-Text
Similar Papers

Abstract

Listen

Aiming to match object identities across different modalities of images, the challenging task namely cross-modality object re-identification (NIR-VIS object Re-ID), has attracted increasing attention due to its wide application in low-light scenes. However, dramatic modality-dependent and camera-related discrepancies between Near-InfraRed-spectrum (NIR) and VISible-spectrum (VIS) images lead to a considerable intra-class gap in feature space. To address the problem, we propose a novel Modality and Camera factors Bi-Disentanglement (MCBD) model to learn modality-independent and camera-unrelated features for NIR-VIS object Re-ID. Our model consists of three key modules, including Confused Modality Generation (CMG), Modality-independent Information Distillation (MID), and Cameras Factor Disentanglement (CFD). Firstly, aiming at aligning image style between NIR and VIS data, the CMG utilizes a designed channel-interactive generator to generate confused modality images which preserves the structure information of original images. Besides, CMG is trained with confused adversarial learning which bridges the modality gap at the image level. Nevertheless, training the model with the confused modality images discards identity-related information such as color and contrast, which is not conducive to the extraction of distinctive features. To solve this problem, the MID is presented to distill out modality-independent information by feeding the original images to the model and reconstructing corresponding NIR and VIS modalities features to the confused modality images. Finally, due to the complexity of camera-related information in images, identity representation inevitably contains camera-related elements such as background and perspective information, which may interfere the matching process. To address this issue, the CFD is introduced to disentangle camera-related factors by the designed three-streams network and two factor decoupling losses, i.e., Camera-Camera Factor loss (CCF) and Identity-Camera Factor (ICF) loss. Comprehensive experiments are carried out on two cross-modality pedestrian Re-ID datasets and a cross-modality vehicle Re-ID dataset to demonstrate that the MCBD is effective in cross-modality object Re-ID task.

Full Text