Abstract

Visible-Infrared person Re-Identification (VI-ReID) conducts comprehensive identity analysis on non-overlapping visible and infrared camera sets for intelligent surveillance systems, which face huge instance variations derived from modality discrepancy. Existing methods employ different kinds of network structure to extract modality-invariant features. Differently, we propose a novel framework, named Dual-Semantic Consistency Learning Network (DSCNet), which attributes modality discrepancy to channel-level semantic inconsistency. DSCNet optimizes channel consistency from two aspects, fine-grained inter-channel semantics, and comprehensive inter-modality semantics. Furthermore, we propose Joint Semantics Metric Learning to simultaneously optimize the distribution of the channel-and-modality feature embeddings. It jointly exploits the correlation between channel-specific and modality-specific semantics in a fine-grained manner. We conduct a series of experiments on the SYSU-MM01 and RegDB datasets, which validates that DSCNet delivers superiority compared with current state-of-the-art methods. On the more challenging SYSU-MM01 dataset, our network can achieve 73.89% Rank-1 accuracy and 69.47% mAP value. Our code is available at <uri xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">https://github.com/bitreidgroup/DSCNet</uri> .

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call