Abstract

Person re-identification (re-ID) is an important topic in computer vision. In this paper, we study the unsupervised person re-ID which aims to identify target identity across multiple non-overlapping cameras for intelligent surveillance systems. The main challenge of unsupervised person re-ID lies in how to learn discriminative features without leveraging any annotated data. In this paper, we apply the Vision Transformer (ViT) to unsupervised person re-identification (re-ID) task. Combined with Multi-label Classification, the performance outperforms most CNN-based methods. We evaluate the proposed model on Market-1501, DukeMTMC-reID and MSMT17 and achieves 56.6%, 49.4%, 14.5% in mAP, respectively, which outperforms the baseline by a clear margin and achieves the state-of-the-art unsupervised re-ID methods.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call