ABSTRACTVisible‐Infrared Person Re‐Identification (VI‐ReID) is a complex challenge in cross‐modality retrieval, wherein the goal is to recognize individuals from images captured via RGB and IR cameras. While many existing methods focus on narrowing the gap between different modalities through designing various feature‐level constraints, they often neglect the consistency of channel statistics information across the modalities, which results in suboptimal matching performance. In this work, we introduce a new approach for VI‐ReID that incorporates Cross‐Composition Normalization (CCN) and Self‐Enrichment Normalization (SEN). Specifically, Cross‐Composition Normalization is a plug‐and‐play module that can be seamlessly integrated into shallow CNN layers without requiring modifications to the training objectives. It probabilistically blends feature statistics between instances, thereby fostering the model's ability to learn inter‐modality feature distributions. Conversely, Self‐Enrichment Normalization leverages attention mechanisms to recalibrate statistics, effectively bridging the gaps between training and test distributions. This enhancement markedly boosts the discriminability of features in VI‐ReID tasks. To validate the efficacy of our proposed method, we carried out comprehensive experiments on two public cross‐modality datasets. The results clearly demonstrate the superiority of our Cross‐Composition and Self‐Enrichment normalization techniques in addressing the challenges of the VI‐ReID problem.
Read full abstract