Abstract

Extracting meaningful representation is a key challenge for person re-identification (re-ID) task, especially in the absence of ground truth labels. However, existing unsupervised approaches simply utilize pseudo labels generated from clustering to supervise re-ID model and thus have not yet fully explored the semantic information existing in data itself. This also limits the representation capabilities of learned models. To address the above problem, we propose mask prediction (MaskPre) as a pretext task for unsupervised re-ID, such that the clustering network can capture more semantic information and separate the images into semantic clusters automatically. Specifically, MaskPre masks region-level features with dynamic dropblock layer to generate differently masked views of a single image. To predict the masked regions and bridge the domain gap across views, we design mask prediction head and moving-average model to learn visual consistency from still image and temporal consistency during training process. Meanwhile, we optimize the model by grouping the two masked views into the same cluster, thus enhancing the consistency across views. Experimental results on three public benchmark datasets show that our proposed method outperforms the existing state-of-the-art approaches.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.