Abstract

Unsupervised domain adaptive (UDA) person Re-Identification aims to improve the model’s generalization capability from labeled source domain to unlabeled target domain. To this end, a strong and robust method is required to extract discriminative features of pedestrians. Recently, transformer-based method achieves great performance on person Re-Identification (ReID). However, due to the domain gap between ImageNet and ReID datasets, it requires a large pre-training dataset to boost performance on Vision Transformer (ViT). To this end, we first investigate self-supervised learning methods with ViTs pretrained on LUPerson datasets, and find it significantly outperforms ImageNet supervised pre-training models on ReID tasks. A Catastrophic Forgetting Score (CFS) is also used to select a subset of LUPerson, which reduces the training time and improves performance. We then proposed a channel-wise self-attention module to reduce the computing cost on the class token. A dual prototype contrastive learning is proposed to fully exploit the hard feature on memory bank under unsupervised domain adaptation. Finally, we achieve state-of-the-art performance on Market-1501 and MSMT17. Our model achieves 91.5%/69.6% mAP accuracy on Market-1501/MSMT17 for supervised ReID, and 90.7%/57.4% mAP for MS2MA/MA2MS UDA ReID.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call