Some unsupervised approaches have been proposed recently for the person re-identification (ReID) problem since annotations of samples across cameras are time-consuming. However, most of these methods focus on the appearance content of the sample itself, and thus seldom take the structure relations among samples into account when learning the feature representation, which would provide a valuable guide for learning the representations of the samples. Thus hard samples may not be well solved due to the limited or even misleading information of the sample itself. To address this issue, in this paper, we propose a Relation-Preserving Feature Embedding (RPE) model that leverages structure relations among samples to boost the performance of the unsupervised person ReID methods without requiring any sample annotations. RPE aims at integrating the sample content and the neighborhood structure relations among samples into the learning of feature embeddings by combining the advantages of the autoencoder and graph autoencoder. Specifically, a relation and content information fusion (RCIF) module is proposed to dynamically merge the information from both perspectives of content and relation levels for feature embedding learning. Also, due to the lack of the identity labels of samples, we adopt an adaptive optimization strategy to update the affinity relations among samples instead of the reconstruction of the whole affinity matrix for optimizing the RPE model, which is more suitable for the unsupervised ReID task. Rigorous experiments on three widely-used large-scale benchmarks for person ReID demonstrate the superiority of the proposed method over current state-of-the-art unsupervised methods.
Read full abstract