Dual-Semantic Consistency Learning for Visible-Infrared Person Re-Identification

Yiyuan Zhang,Sanyuan Zhao,Yuhao Kang,Jianbing Shen

doi:10.1109/tifs.2022.3224853

Abstract

Visible-Infrared person Re-Identification (VI-ReID) conducts comprehensive identity analysis on non-overlapping visible and infrared camera sets for intelligent surveillance systems, which face huge instance variations derived from modality discrepancy. Existing methods employ different kinds of network structure to extract modality-invariant features. Differently, we propose a novel framework, named Dual-Semantic Consistency Learning Network (DSCNet), which attributes modality discrepancy to channel-level semantic inconsistency. DSCNet optimizes channel consistency from two aspects, fine-grained inter-channel semantics, and comprehensive inter-modality semantics. Furthermore, we propose Joint Semantics Metric Learning to simultaneously optimize the distribution of the channel-and-modality feature embeddings. It jointly exploits the correlation between channel-specific and modality-specific semantics in a fine-grained manner. We conduct a series of experiments on the SYSU-MM01 and RegDB datasets, which validates that DSCNet delivers superiority compared with current state-of-the-art methods. On the more challenging SYSU-MM01 dataset, our network can achieve 73.89% Rank-1 accuracy and 69.47% mAP value. Our code is available at <uri xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">https://github.com/bitreidgroup/DSCNet</uri> .

Full Text