Abstract

Visible-thermal person re-identification (VT-ReID) is an image retrieval task that aims at matching the target pedestrian across the visible and thermal modalities. However, intra-class variations and cross-modality discrepancy degrade the performance of VT-ReID. Recent methods focus on extracting discriminative local features of each modality to alleviate the intra-class variations and cross-modality discrepancy, but these methods ignore semantic relations between the local features of two modalities, <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">i.e.</i> , the spatial relations and channel relations. In this paper, we proposed a feature aggregation module (FAM) to enhance the correlation between local features including spatial dependencies and channel dependencies. Furthermore, FAM implements cross-modality feature aggregation on the enhanced features to reduce the cross-modality discrepancy. Moreover, we also proposed near neighbor cross-modality loss (NNCLoss) to mine feature consistency between modalities by constructing a cross-modality near neighbor set, which facilitates feature alignment between two modalities. Extensive experiments on two datasets demonstrate the superior performance of our approach over the existing state-of-the-arts.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call