Fully Unsupervised Person Re-Identification via Selective Contrastive Learning
Person re-identification (ReID) aims at searching the same identity person among images captured by various cameras. Existing fully supervised person ReID methods usually suffer from poor generalization capability caused by domain gaps. Unsupervised person ReID has attracted a lot of attention recently, because it works without intensive manual annotation and thus shows great potential in adapting to new conditions. Representation learning plays a critical role in unsupervised person ReID. In this work, we propose a novel selective contrastive learning framework for fully unsupervised feature learning. Specifically, different from traditional contrastive learning strategies, we propose to use multiple positives and adaptively selected negatives for defining the contrastive loss, enabling to learn a feature embedding model with stronger identity discriminative representation. Moreover, we propose to jointly leverage global and local features to construct three dynamic memory banks, among which the global and local ones are used for pairwise similarity computation and the mixture memory bank are used for contrastive loss definition. Experimental results demonstrate the superiority of our method in unsupervised person ReID compared with the state of the art. Our code is available at https://github.com/pangbo1997/Unsup_ReID.git .
- Research Article
- 10.11834/jig.230022
- Jan 1, 2023
- Journal of Image and Graphics
行人再识别通过大时空范围内跨摄像机目标行人图像的检索与匹配,可实现人脸等生物特征失效情况下的行人关联,已成为智能视频监控系统的关键环节和支撑技术,并在智慧公安、智慧城市等国民经济建设中发挥了重要作用。近年行人再识别技术吸引了越来越多的关注,并取得了快速发展与进步。本文在对行人再识别技术进行简介的基础上,面向行人再识别的技术发展和落地应用需求与挑战,总结分析遮挡行人再识别、无监督行人再识别、虚拟数据生成、域泛化行人再识别、换装行人再识别、跨模态行人再识别和行人搜索等热点方向的前沿进展,归纳其发展现状和存在问题,最后对行人再识别技术的发展趋势进行展望。希望通过总结和分析,能够为研究人员开展行人再识别相关研究、推动行人再识别技术进步提供参考。;Person re-Identification(person re-ID) technique aims to solve the problem of association and matching of target person images across multiple cameras within a camera network of a surveillance system, especially in the case of face, iris and other biometrics recognition failure under non-cooperative application scenarios, and has become one of the key component and supporting technique for intelligent video surveillance systems and applications in intelligent public security and smart cities. Recently, person re-ID has attracted more and more attention from both academia and industry, and has made rapid development and progress. To meet the requirement of its technical challenges and application needs of person re-ID in practical scenarios, this paper will first give a brief introduction to the development history, commonly used datasets and evaluation metrics. Then, the recent progress in hot research topics of person re-ID is extensively reviewed and analyzed, which includes:occluded person re-ID, unsupervised person re-ID, virtual data generation, domain generalization, clothchanging person re-ID, cross-modal person re-ID and person search. First, to address the problem of impact of possible occlusions on the performance of person re-ID, recent progress in occluded person re-ID is first reviewed, in which the popular datasets for occluded person re-ID are briefly introduced, and the two major categories of occluded person re-ID models are then further reviewed. Second, facing the challenges of low-efficiency and high-cost data annotation and great impact of training data on the performance of person re-ID, unsupervised person re-ID and virtual data generation for person re-ID emerges as two hot topics in person re-ID. The paper elaborates the recent advances of unsupervised person reID, which can be classified into three major categories:pseudo label generation-based models, domain transfer-based models, and other related models, which take into consideration of the extra information like time-stamps, camera labels besides person image. Third, the-state-of-the-art works on virtual data generation for person re-ID are reviewed, with detailed introduction and performance comparisons of major virtual datasets. Fourth, recent researches on domain generalization person re-ID will be reviewed, which are classified into five categories:batch/instance normalization models, domain invariant feature learning models, deep-learning-based explicit image matching models, models based on mixture of experts and meta-learning-based models. Fifth, since most current person re-ID models largely depend on the color appearance of persons' clothes, clothes-changing person re-ID becomes a challenging setting, in which person images can exhibit large intra-class variation and small inter-class variation. Typical cloth-changing person re-ID datasets are introduced and the recent researches will then be reviewed, in which models in the first category explicitly introduces extra cloth-appearance-independent features like contour and face while the second try to decouple the cloth features and person ID features. Sixth, to compensate the drawbacks of conventional person re-ID of visible light/RGB images in natural complex scenes like poor lighting conditions in the night, the-state-of-the-art of cross-modal person re-ID, which aims to resolve the problem through other visible RGB images-excluded heterogeneous data, are reviewed, with a brief introduction of commonly used cross-modal person re-ID datasets first and then four sub-categories models according to the different modalities employed, including:RGB-infrared image person re-ID, RGB image-text person re-ID, RGB image-sketch person re-ID, and RGB-depth image person re-ID, respectively. Seventh, since existing person re-ID benchmarks and methods mainly focus on matching cropped person images between queries and candidates and is different from practical scenarios where the bounding box annotations of persons are often unavailable, person search, which jointly considers person detection and person re-ID in a single framework, becomes a new hot research topic. The typical datasets and recent progress on person search are reviewed. Finally, the existing challenges and development trend of person re-ID techniques are discussed. It is hoped that the summary and analysis can provide reference for relevant researchers to carry out research on person re-ID and promote the progress of person re-ID techniques and applications.
- Conference Article
1
- 10.1109/icceai52939.2021.00063
- Aug 1, 2021
Most supervised learning methods are currently used to solve the task of person re-identification (Re-ID) and have yielded good results. But these methods still have disadvantages such as the need for manual annotation of training data. Especially for large data sets, they need too high cost of manual annotation and the data are difficult to obtain for fully pairwise labeling. So unsupervised learning becomes a necessarily trend for person Re-ID, and we decide to solve the task via unsupervised learning method. Moreover, global features focus on spatial integrity of person features, and local ones help to highlight discriminative features of different patches. Therefore, we propose a fine and coarse-grained unsupervised (FCU) learning framework of global and local branches' feature learning to solve Re-ID task. Specifically, for local branch, extract patches from a feature map which learned on a PatchNet network of images, and learn their fine-grained features to close similar patches and push away dissimilar ones. For global branch, maximize the diversity between classes by repelled loss and similarity within classes by attracted loss, then similarity and diversity in the unlabeled datasets are used as information for unsupervised cluster merging and learning their coarse-grained features. The two branches are used to jointly achieve the effect of increasing inter-class differences and intra-class similarity. A large number of experiments verify the superiority of our method for unsupervised person re-identification.
- Research Article
5
- 10.1109/tip.2024.3514360
- Jan 1, 2024
- IEEE transactions on image processing : a publication of the IEEE Signal Processing Society
Unsupervised person re-identification (Re-ID) aims to learn semantic representations for person retrieval without using identity labels. Most existing methods generate fine-grained patch features to reduce noise in global feature clustering. However, these methods often compromise the discriminative semantic structure and overlook the semantic consistency between the patch and global features. To address these problems, we propose a Person Intrinsic Semantic Learning (PISL) framework with diffusion model for unsupervised person Re-ID. First, we design the Spatial Diffusion Model (SDM), which performs a denoising diffusion process from noisy spatial transformer parameters to semantic parameters, enabling the sampling of patches with intrinsic semantic structure. Second, we propose the Semantic Controlled Diffusion (SCD) loss to guide the denoising direction of the diffusion model, facilitating the generation of semantic patches. Third, we propose the Patch Semantic Consistency (PSC) loss to capture semantic consistency between the patch and global features, refining the pseudo-labels of global features. Comprehensive experiments on three challenging datasets show that our method surpasses current unsupervised Re-ID methods. The source code will be publicly available at https://github.com/taoxuefong/Diffusion-reid.
- Research Article
5
- 10.3390/rs16020422
- Jan 22, 2024
- Remote Sensing
Unsupervised person re-identification (Re-ID) aims to match the query image of a person with images in the gallery without the use of supervision labels. Most existing methods usually generate pseudo-labels through clustering algorithms for contrastive learning, which inevitably results in noisy labels assigned to samples. In addition, methods that only apply contrastive learning at the clustering level fail to fully consider instance-level relationships between instances. Motivated by this, we propose a joint contrastive learning (JCL) framework for unsupervised person Re-ID. Our proposed method involves creating two memory banks to store features of cluster centroids and instances and applies cluster and instance-level contrastive learning, respectively, to jointly optimize the neural networks. The cluster-level contrastive loss is used to promote feature compactness within the same cluster and reinforce identity similarity. The instance-level contrastive loss is used to distinguish easily confused samples. In addition, we use a WaveBlock attention module (WAM), which can continuously wave feature map blocks and introduce attention mechanisms to produce more robust feature representations of a person without considerable information loss. Furthermore, we enhance the quality of our clustering by leveraging camera label information to eliminate clusters containing single camera captures. Extensive experimental results on two widely used person Re-ID datasets verify the effectiveness of our JCL method. Meanwhile, we also used two remote sensing datasets to demonstrate the generalizability of our method.
- Research Article
- 10.1049/sfw2/6394038
- Jan 1, 2025
- IET Software
In recent years, some methods utilize a transformer as the backbone to model the long‐range context dependencies, reflecting a prevailing trend in unsupervised person reidentification (Re‐ID) tasks. However, they only explore the global information through interactive learning in the framework of the transformer, which ignores the learning of the part information in the interaction process for pedestrian images. In this study, we present a novel transformer network for unsupervised person Re‐ID, a stripe‐driven fusion transformer (SDFT), designed to simultaneously capture the global interaction and the part interaction when modeling the long‐range context dependencies. Meanwhile, we present a stripe‐driven regularization (SDR) to constrain the part aggregation features and the global features by considering the consistency principle from the aspects of the features and the clusters, aiming to improve the representational capacity of the features. Furthermore, to investigate the relationships between local regions of pedestrian images, we present a stripe‐driven contrastive loss (SDCL) to learn discriminative part features from the perspectives of pedestrian identity and stripes. The proposed method has undergone extensive validations on publicly available unsupervised person Re‐ID benchmarks, and the experimental results confirm its superiority and effectiveness.
- Research Article
4
- 10.1016/j.ijleo.2023.170718
- Feb 25, 2023
- Optik
Multiple camera styles learning for unsupervised person re-identification
- Research Article
2
- 10.3390/s23063259
- Mar 20, 2023
- Sensors (Basel, Switzerland)
State-of-the-art purely unsupervised learning person re-ID methods first cluster all the images into multiple clusters and assign each clustered image a pseudo label based on the cluster result. Then, they construct a memory dictionary that stores all the clustered images, and subsequently train the feature extraction network based on this dictionary. All these methods directly discard the unclustered outliers in the clustering process and train the network only based on the clustered images. The unclustered outliers are complicated images containing different clothes and poses, with low resolution, severe occlusion, and so on, which are common in real-world applications. Therefore, models trained only on clustered images will be less robust and unable to handle complicated images. We construct a memory dictionary that considers complicated images consisting of both clustered and unclustered images, and design a corresponding contrastive loss by considering both kinds of images. The experimental results show that our memory dictionary that considers complicated images and contrastive loss can improve the person re-ID performance, which demonstrates the effectiveness of considering unclustered complicated images in unsupervised person re-ID.
- Research Article
- 10.15625/1813-9663/23018
- Sep 23, 2025
- Journal of Computer Science and Cybernetics
Person re-identification (ReID) plays a crucial role in computer vision-based surveillance systems, enabling the accurate identification of individuals across multiple camera views. Traditional convolutional neural network (CNN)-based approaches, such as those utilizing ResNet-50, struggle to capture long-range dependencies and contextual relationships, limiting their effectiveness in diverse real-world scenarios. To overcome these challenges, recent advancements have explored Vision Transformer (ViT)-based architectures, leveraging self-attention mechanisms for enhanced feature representation. In this research, we introduce a ViT-based framework, namely ViTC-UReID, for unsupervised person ReID by incorporating a camera-aware proxy learning mechanism to improve feature consistency across different camera viewpoints. Moreover, ViTC-UReID also uses clustering algorithms to generate pseudo labels for samples in training datasets. Our approach significantly enhances cross-camera adaptation, reducing domain shift effects while maintaining strong feature discrimination. We evaluate our method on three widely used benchmarks Market-1501, MSMT17, and CUHK03, demonstrating its superior performance compared to existing state-of-the-art unsupervised methods, particularly those utilizing camera identity cues. Furthermore, our model achieves competitive accuracy with fully supervised methods, highlighting the effectiveness of transformer-based representations in complex person ReID scenarios. Our findings reinforce the growing potential of unsupervised person ReID methods and demonstrate that ViT architectures combined with camera-aware learning can drive substantial improvements in person ReID.
- Book Chapter
17
- 10.1007/978-3-031-02444-3_40
- Jan 1, 2022
Unsupervised person re-identification (ReID) aims to match a query image of a pedestrian to the images in gallery set without supervision labels. The most popular approaches to tackle unsupervised person ReID are usually performing a clustering algorithm to yield pseudo labels at first and then exploit the pseudo labels to train a deep neural network. However, the pseudo labels are noisy and sensitive to the hyper-parameter(s) in clustering algorithm. In this paper, we propose a Hybrid Contrastive Learning (HCL) approach for unsupervised person ReID, which is based on a hybrid between instance-level and cluster-level contrastive loss functions. Moreover, we present a Multi-Granularity Clustering Ensemble based Hybrid Contrastive Learning (MGCE-HCL) approach, which adopts a multi-granularity clustering ensemble strategy to mine priority information among the pseudo positive sample pairs and defines a priority-weighted hybrid contrastive loss for better tolerating the noises in the pseudo positive samples. We conduct extensive experiments on two benchmark datasets Market-1501 and DukeMTMC-reID. Experimental results validate the effectiveness of our proposals.KeywordsUnsupervised person ReIDContrastive learningCluster ensembleMulti-granularity
- Conference Article
- 10.1109/icisce50968.2020.00268
- Dec 1, 2020
Recently, deep learning has achieved great success in person re-identification (ReID). This paper aims to review the state-of-the-art person ReID algorithms based on deep learning. In terms of different application problems in person ReID, the algorithms presented are roughly divided into three categories, i.e., image-based person ReID, video-based person ReID, unsupervised person ReID. For each category, we review the most representative research works in recent years. Our survey aims to address existing problems, challenges and future research directions based on the analyses of the current latest advancements made toward deep learning based person ReID techniques.
- Research Article
38
- 10.1109/tip.2020.3016869
- Jan 1, 2020
- IEEE Transactions on Image Processing
Unsupervised person re-identification (Re-ID) has better scalability and practicability than supervised Re-ID in the actual deployment. However, it is difficult to learn a discriminative Re-ID model without annotations. To address the above issue, we propose an end-to-end Self-supervised Agent Learning (SAL) algorithm by exploiting a set of agents as a bridge to reduce domain gaps for unsupervised cross-domain person Re- ID. The proposed SAL model enjoys several merits. First, to the best of our knowledge, this is the first work to exploit selfsupervised learning for unsupervised person Re-ID. Second, our model has designed three effective learning mechanisms including supervised label learning in source domain, similarity consistency learning in target domain, and self-supervised learning in cross domain, which can learn domain-invariant yet discriminative representations through the principled lens of agent learning by reducing domain discrepancy adaptively. Extensive experimental results on three standard benchmarks demonstrate that the proposed SAL performs favorably against state-of-the-art unsupervised person Re-ID methods.
- Research Article
9
- 10.1007/s13042-021-01308-6
- Apr 2, 2021
- International Journal of Machine Learning and Cybernetics
Person re-identification (Re-ID) models usually present a limited performance when they are trained on one dataset and tested on another dataset due to the inter-dataset bias (e.g. completely different identities and backgrounds) and the intra-dataset difference (e.g. camera and pose changes). In other words, the absence of identity labels (who the person is) and pairwise labels (whether a pair of images belongs to the same person or not) leads to failures in unsupervised person Re-ID problem. We argue that synchronous consideration of these two aspects can improve the performance of unsupervised person Re-ID model. In this work, we introduce a Classification and Latent Commonality (CLC) method based on transfer learning for the unsupervised person Re-ID problem. Our method has three characteristics: (1) proposing an imitate model to generate an imitated target domain with estimated identity labels and create a pseudo target domain to compensate the pairwise labels across camera views; (2) formulating a dual classification loss on both the source domain and imitated target domain to learn a discriminative representation and diminish the inter-domain bias; (3) investigating latent commonality and reducing the intra-domain difference by constraining triplet loss on the source domain, imitated target domain and pairwise label target domain (composed of pseudo target domain and target domain). Extensive experiments are conducted on three widely employed benchmarks, including Market-1501, DukeMTMC-reID and MSMT17, and experimental results demonstrate that the proposed method can achieve a competitive performance against other state-of-the-art unsupervised Re-ID approaches.
- Research Article
101
- 10.1109/tmm.2020.3001522
- Jun 15, 2020
- IEEE Transactions on Multimedia
Unsupervised domain adaptation (UDA) aims to mitigate the domain shift that occurs when transferring knowledge from a labeled source domain to an unlabeled target domain. While it has been studied for application in unsupervised person re-identification (ReID), the relations of feature distribution across the source and target domains remain underexplored, as they either ignore the local relations or omit the in-depth consideration of negative transfer when two domains do not share identical label spaces. In light of the above, this paper presents an innovative part-aware progressive adaptation network (PPAN) that exploits global and local relations for UDA-based ReID across domains. A multi-branch network is developed that explicitly learns discriminative feature representation from both whole-body images and body-part images under the supervision of a labeled source domain. Within each network branch, an independent UDA constraint is designed that aligns the global and local feature distributions from a labeled source domain with those of an unlabeled target domain. In addition, a novel progressive adaptation strategy (PAS) is designed that effectively alleviates the negative influence of outlier source identities. The proposed unsupervised ReID model is evaluated on five widely used datasets (Market-1501, DukeMTMC-reID, CUHK03, VIPeR and PRID), and experimental results demonstrate its superior robustness and effectiveness relative to state-of-the-art approaches.
- Conference Article
45
- 10.1109/ic-nidc54101.2021.9660560
- Nov 17, 2021
Unsupervised person re-identification (Re-ID) is a promising and very challenging research problem in computer vision. Learning robust and discriminative features with unlabeled data is of central importance to Re-ID. Recently, more attention has been paid to unsupervised Re-ID algorithms based on clustered pseudo-label. However, the previous approaches did not fully exploit information of hard samples, simply using cluster centroid or all instances for contrastive learning. In this paper, we propose a Hard-sample Guided Hybrid Contrast Learning (HHCL) approach combining cluster-level loss with instance-level loss for unsupervised person Re-ID. Our approach applies cluster centroid contrastive loss to ensure that the network is updated in a more stable way. Meanwhile, introduction of a hard instance contrastive loss further mines the discriminative information. Extensive experiments on two popular large-scale Re-ID benchmarks demonstrate that our HHCL outperforms previous state-of-the-art methods and significantly improves the performance of unsupervised person Re-ID. The code of our work is available soon at https://github.com/bupt-ai-cz/HHCL-ReID
- Research Article
- 10.1007/s00371-024-03788-3
- Feb 19, 2025
- The Visual Computer
Person re-identification (ReID) aims to identify such individuals across diverse surveillance scenarios, which plays a pivotal role in centralized monitoring. However, very recent studies pre-trained their models on ImageNet and fine-tuned on specific downstream ReID dataset, which leads to restricted generalization capability on identifying person images in poor imaging conditions. In order to address such limitation, this paper introduces a Self-Supervised Learning model with Task-Oriented Knowledge Distillation for person ReID. The ReID-oriented Prior is proposed to simulate the primary challenges of low-quality imaging and occlusion in real-world scenarios, while the teacher–student network along with relative projectors is adopted as the knowledge distillation paradigm. By incorporating multiple loss functions, the self-supervised network aims not only to restore detailed and masked embeddings but also to align invariant representations between partial and complete semantics. Our model is pre-trained on a person-specific dataset LUPerson without additional head-craft labels. Extensive experiments carried out on Market1501, MSMT17, and Occluded-Duke show that our method yields the state-of-the-art performance on supervised person ReID. Moreover, the proposed method could obtain the remarkable performance on partial and unsupervised person ReID, which further indicates the strong generalizability of our method. The code is publicly available at https://github.com/ICT-CVlab/Oriented-KD-SSL.