Abstract

• We propose a transformer-based framework for cross-camera person re-identification. • A novel module is proposed to obtain effective patch tokens from input images. • The proposed framework outperforms the state-of-the-art methods on three datasets. As an essential task applied to video surveillance, person re-identification (Re-ID) suffers from variations across different cameras. In this paper we propose an effective transformer-based Re-ID framework for learning the identity-discriminative and camera-invariant feature representations. In contrast to the recent direction of using generative models to augment training data and enhance the invariance to input variations, we show that explicitly designing a novel adversarial loss from the perspective of feature representation learning helps to penalize the distribution discrepancy across multiple camera domains effectively. Recently, the pure transformer model has gained much attention due to its strong representation capabilities. We employ a pure transformer encoder to extract a global feature vector for the patch tokens of each person image. Notably, a novel cross-patch encoder is introduced to obtain structural information between image patches. Extensive experiments on three challenging datasets demonstrate the effectiveness and superiority of the proposed learning framework.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.