Visible–infrared person re-identification (VIPR) plays an important role in intelligent transportation systems. Modal discrepancies between visible and infrared images seriously confuse person appearance discrimination, e.g., the similarity of the same class of different modalities is lower than the similarity between different classes of the same modality. Worse still, the modal discrepancies and appearance discrepancies are coupled with each other. The prevailing practice is to disentangle modal and appearance discrepancies, but it usually requires complex decoupling networks. In this paper, rather than disentanglement, we propose to measure and optimize modal discrepancies. We explore a cross-modal group-relation (CMGR) to describe the relationship between the same group of people in two different modalities. The CMGR has great potential in modal invariance because it considers more stable groups rather than individuals, so it is a good measurement for modal discrepancies. Furthermore, we design a group-relation correlation (GRC) loss function based on Pearson correlations to optimize CMGR, which can be easily integrated with the learning of VIPR’s appearance features. Consequently, our CMGR model serves as a pivotal constraint to minimize modal discrepancies, operating in a manner similar to a loss function. It is applied solely during the training phase, thereby obviating the need for any execution during the inference phase. Experimental results on two public datasets (i.e., RegDB and SYSU-MM01) demonstrate that our CMGR method is superior to state-of-the-art approaches. In particular, on the RegDB dataset, with the help of CMGR, the rank-1 identification rate has improved by more than 7% compared to the case of not using CMGR.
Read full abstract