ACML: Attention-Based Cross-Modality Learning For Cloth-Changing and Occluded Person Re-Identification

  • Abstract
  • Literature Map
  • Similar Papers
Abstract
Translate article icon Translate Article Star icon

Person Re-Identification (Re-ID) aims at matching a person captured by a non-overlapping camera system. Real-world Re-ID presents challenges like clothing changes and occlusions, which limits the applicability of traditional appearance-based methods. Cloth-Changing Re-ID (CCRe-ID) methods that rely on cloth-invariant modalities, such as shape, gait, etc., ignore occlusions and fail to mine the complementary relationship across modalities. Meanwhile, methods that explicitly focus on occlusion management struggle with cloth-changing scenarios. To address these, we propose ACML: Attention-based Cross-Modality Learning, the first framework to tackle both clothing changes and occlusion in Re-ID. Our lightweight framework comprises a unified network with cascaded Cross-Attention Blocks that extracts appearance and shape features collaboratively, enhancing robustness under clothing changes, viewpoint variations, and poor illumination conditions. Inputs to the network are produced by our novel occlusion synthesis module, which not only helps exposing the model to occlusions but also guides the model to adaptively attend to informative cues and reduce noise. Experiments demonstrate the effectiveness of ACML on both CCRe-ID and occluded Re-ID datasets.

Similar Papers
  • Research Article
  • Cite Count Icon 8
  • 10.1007/s11042-020-09361-z
Occluded person re-identification based on feature fusion and sparse reconstruction
  • Jul 23, 2020
  • Multimedia Tools and Applications
  • Fei Gao + 4 more

Person Re-identification is one of the hotspots in the field of computer vision, especially for occluded person re-identification, which is still a challenge. In this paper, a feature fusion and sparse reconstruction based method of occluded person re-identification is proposed, which is suitable for person re-identification in various occlusion situations and where pose estimation is employed to obtain the occlusion body parts. A Full Occlusion Re-identification Network(FORN) is developed, where the obstruction is blackened. In the FORN, partial feature extraction and sparse feature reconstruction is combined through tree connections. The fusion features are facilitated in the FORN for occluded person similarity matching so that the matching rate of person re-identification under various occlusion situations is improved. On the occluded person re-identification datasets Partial-REID and Partial-iLIDS, the FORN method has obtained the experimental results of R-1 index 62.75% and 64.26%, and R-3 index 79.43% and 73.10%, respectively. Experiments are also conducted on conventional person re-identification datasets and the experimental results have verified the effectiveness and advancement of the proposed method.

  • Conference Article
  • 10.1109/ijcb54206.2022.10007947
MSFL-Net: Multi-Semantic Feature Learning Network for Occluded Person Re-Identification
  • Oct 10, 2022
  • Guangyu Huang + 5 more

Recently, occluded person re-identification (Re-ID) has received significant interest due to its widespread real-world applications. However, most existing occluded person Re-ID methods ignore semantic granularities that indicate different levels of occluded information of the human body, leading to sub-optimal performance. To address this, we propose a Multi-Semantic Feature Learning Network (MSFL-Net) for occluded person Re-ID. Specifically, MSFL-Net involves a backbone network and a Multi-branch Feature Learning sub-network (MFL). MFL consists of two local-global branches and a global branch to learn multisemantic features in a multi-branch deep network architecture. In each local-global branch, we design a local subbranch and a semantic-guided global sub-branch to extract discriminative features at a certain level of feature granularity and semantic granularity. In the global branch, we learn global features at the largest level of semantic granularity. In particular, a patch contrastive loss is developed to explicitly encourage the semantic feature maps to capture the information from specific body parts. By extracting multi-semantic features, our method is effective in dealing with person Re-ID at different occlusion levels. Experimental results on an occluded person Re-ID dataset (Occluded-REID) and two partial person Re-ID datasets (Partial-iLIDS and Partial-REID) show the superiority of our method against state-of-the-art person Re-ID methods.

  • Research Article
  • Cite Count Icon 40
  • 10.1007/s00521-022-07400-4
Short range correlation transformer for occluded person re-identification
  • Jun 1, 2022
  • Neural Computing and Applications
  • Yunbin Zhao + 3 more

Occluded person re-identification is one of the challenging areas of computer vision, which faces problems such as inefficient feature representation and low recognition accuracy. Recently, vision transformer is introduced into the field of re-identification and achieved state-of-the-art results by constructing global feature relationships between patch sequences. However, vision transformer is not good at capturing short-range correlations of patch sequence and exploiting spatial correlation in patch sequence, which leads to a decrease in the accuracy and robustness of the network in the face of occluded person re-identification. Therefore, to address the above problems, we design a partial feature transformer-based occluded person re-identification framework named PFT. The proposed PFT utilizes three modules to enhance the efficiency of vision transformer. (1) Patch full dimension enhancement module. We design a learnable tensor with the same size as patch sequences, which is full-dimensional and deeply embedded in patch sequences to enrich the diversity of training samples. (2) Fusion and reconstruction module. We extract the less important part of obtained patch sequences, and fuse them with original patch sequence to reconstruct the original patch sequences. (3) Spatial Slicing Module. We slice and group patch sequences from spatial direction, which can effectively improve the short-range correlation of patch sequences. Experimental results over occluded and holistic re-identification datasets demonstrate that the proposed PFT network achieves superior performance consistently and outperforms the state-of-the-art methods.

  • Research Article
  • Cite Count Icon 107
  • 10.1016/j.eswa.2023.122419
Occluded person re-identification with deep learning: A survey and perspectives
  • Nov 2, 2023
  • Expert Systems with Applications
  • Enhao Ning + 4 more

Occluded person re-identification with deep learning: A survey and perspectives

  • Research Article
  • 10.22581/0449
Overcoming Occlusion in Person Re-Identification: A Multi-Level Attention Transformer Approach
  • Jan 1, 2026
  • Mehran University Research Journal of Engineering and Technology
  • Dr Najma Imtiaz Ali Brohi + 5 more

Person re-identification (ReID) in real world surveillance scenarios is a very challenging problem, in which occlusions are a major culprit that can severely degrade the performance of existing systems. In this paper, we proceed one step closer towards solving this critical problem by proposing a novel Multi Level Attention Mechanism (MLAM) for occluded person re identification. Our approach uses spatial, channel and global context attention to tackle different occlusion cases from partial to severe. The proposed method integrates two key architectures: the Occlusion-Aware ReID Transformer (OART) and the Multi-Level Attention Transformer Network (MLATN). Specifically, we demonstrate that the proposed framework enables adaptive feature extraction and occlusion aware fusion, which brings large robustness gains when used for adaptive ReID in real world challenging environments. Study evaluate the approach through extensive experiments on the challenging datasets, Occluded-DukeMTMC and OccludedREID, and demonstrate the superiority of our approach. For the Occluded DukeMTMC, the MLAM achieves state of the art performance with 2.7% and 5.1% Rank 1 accuracy and mean Average Precision (mAP) respectively. We also propose the Occlusion Robustness Index (ORI): We present a new model invariant metric to quantify model resilience to occlusions. Beyond surveillance, the results of this research are applicable to autonomous driving, robotics, and augmented reality. Nevertheless, significant advances have been made, which casts into sharp relief significant ethical issues around privacy and protection of information, and a need for accountable development and deployment of such technologies. Towards this end, we believe this work presents a large step towards occluded person reidentification and the development of robust adaptable vision recognition systems for difficult real world circumstances.

  • Research Article
  • Cite Count Icon 29
  • 10.1145/3610534
Deep Learning Based Occluded Person Re-Identification: A Survey
  • Oct 23, 2023
  • ACM Transactions on Multimedia Computing, Communications, and Applications
  • Yunjie Peng + 6 more

Occluded person re-identification (Re-ID) focuses on addressing the occlusion problem when retrieving the person of interest across non-overlapping cameras. With the increasing demand for intelligent video surveillance and the application of person Re-ID technology, the real-world occlusion problem draws considerable interest from researchers. Although a large number of occluded person Re-ID methods have been proposed, there are few surveys that focus on occlusion. To fill this gap and help boost future research, this article provides a systematic survey of occluded person Re-ID. In this work, we review recent deep learning based occluded person Re-ID research. First, we summarize the main issues caused by occlusion as four groups: position misalignment, scale misalignment, noisy information, and missing information. Second, we categorize existing methods into six solution groups: matching, image transformation, multi-scale features, attention mechanism, auxiliary information, and contextual recovery. We also discuss the characteristics of each approach, as well as the issues they address. Furthermore, we present the performance comparison of recent occluded person Re-ID methods on four public datasets: Partial-ReID, Partial-iLIDS, Occluded-ReID, and Occluded-DukeMTMC. We conclude the study with thoughts on promising future research directions.

  • Conference Article
  • Cite Count Icon 24
  • 10.1109/icassp43922.2022.9746734
Occluded Person Re-Identification Via Relational Adaptive Feature Correction Learning
  • May 23, 2022
  • Minjung Kim + 4 more

Occluded person re-identification (Re-ID) in images captured by multiple cameras is challenging because the target person is occluded by pedestrians or objects, especially in crowded scenes. In addition to the processes performed during holistic person Re-ID, occluded person Re-ID involves the removal of obstacles and the detection of partially visible body parts. Most existing methods utilize the off-the-shelf pose or parsing networks as pseudo labels, which are prone to error. To address these issues, we propose a novel Occlusion Correction Network (OCNet) that corrects features through relational-weight learning and obtains diverse and representative features without using external networks. In addition, we present a simple concept of a center feature in order to provide an intuitive solution to pedestrian occlusion scenarios. Furthermore, we suggest the idea of Separation Loss (SL) for focusing on different parts between global features and part features. We conduct extensive experiments on five challenging benchmark datasets for occluded and holistic Re-ID tasks to demonstrate that our method achieves superior performance to state-of-the-art methods especially on occluded scene.

  • Conference Article
  • Cite Count Icon 62
  • 10.1109/iccv48922.2021.01166
Occluded Person Re-Identification with Single-scale Global Representations
  • Oct 1, 2021
  • Cheng Yan + 5 more

Occluded person re-identification (ReID) aims at re-identifying occluded pedestrians from occluded or holistic images taken across multiple cameras. Current state-of-the-art (SOTA) occluded ReID models rely on some auxiliary modules, including pose estimation, feature pyramid and graph matching modules, to learn multi-scale and/or part-level features to tackle the occlusion challenges. This unfortunately leads to complex ReID models that (i) fail to generalize to challenging occlusions of diverse appearance, shape or size, and (ii) become ineffective in handling non-occluded pedestrians. However, real-world ReID applications typically have highly diverse occlusions and involve a hybrid of occluded and non-occluded pedestrians. To address these two issues, we introduce a novel ReID model that learns discriminative single-scale global-level pedestrian features by enforcing a novel exponentially sensitive yet bounded distance loss on occlusion-based augmented data. We show for the first time that learning single-scale global features without using these auxiliary modules is able to outperform the SOTA multi-scale and/or part-level feature-based models. Further, our simple model can achieve new SOTA performance in both occluded and non-occluded ReID, as shown by extensive results on three occluded and two general ReID benchmarks. Additionally, we create a large-scale occluded person ReID dataset with various occlusions in different scenes, which is significantly larger and contains more diverse occlusions and pedestrian dressings than existing occluded ReID datasets, providing a more faithful occluded ReID benchmark. The dataset is available at: https://git.io/OPReID

  • Conference Article
  • Cite Count Icon 401
  • 10.1109/cvpr46437.2021.00292
Diverse Part Discovery: Occluded Person Re-identification with Part-Aware Transformer
  • Jun 1, 2021
  • Yulin Li + 5 more

Occluded person re-identification (Re-ID) is a challenging task as persons are frequently occluded by various obstacles or other persons, especially in the crowd scenario. To address these issues, we propose a novel end-to-end Part-Aware Transformer (PAT) for occluded person Re-ID through diverse part discovery via a transformer encoder-decoder architecture, including a pixel context based transformer encoder and a part prototype based transformer decoder. The proposed PAT model enjoys several merits. First, to the best of our knowledge, this is the first work to exploit the transformer encoder-decoder architecture for occluded person Re-ID in a unified deep model. Second, to learn part prototypes well with only identity labels, we design two effective mechanisms including part diversity and part discriminability. Consequently, we can achieve diverse part discovery for occluded person Re-ID in a weakly supervised manner. Extensive experimental results on six challenging benchmarks for three tasks (occluded, partial and holistic Re-ID) demonstrate that our proposed PAT performs favor-ably against stat-of-the-art methods.

  • Research Article
  • Cite Count Icon 93
  • 10.1109/tcsvt.2020.3033165
Semantic-Aware Occlusion-Robust Network for Occluded Person Re-Identification
  • Oct 22, 2020
  • IEEE Transactions on Circuits and Systems for Video Technology
  • Xiaokang Zhang + 4 more

In recent years, deep learning-based person re-identification (Re-ID) methods have made significant progress. However, the performance of these methods substantially decreases when dealing with occlusion, which is ubiquitous in realistic scenarios. In this article, we propose a novel semantic-aware occlusion-robust network (SORN) that effectively exploits the intrinsic relationship between the tasks of person Re-ID and semantic segmentation for occluded person Re-ID. Specifically, the SORN is composed of three branches, including a local branch, a global branch, and a semantic branch. In particular, the local branch extracts part-based local features, and the global branch leverages a novel spatial-patch contrastive loss (SPC) to extract occlusion-robust global features. Meanwhile, the semantic branch generates a foreground-background mask for a pedestrian image, which indicates the non-occluded areas of the human body. The three branches are jointly trained in a unified multi-task learning network. Finally, pedestrian matching is performed based on the local features extracted from the non-occluded areas and the global features extracted from the whole pedestrian image. Extensive experimental results on a large-scale occluded person Re-ID dataset (i.e., Occluded-DukeMTMC) and two partial person Re-ID datasets (i.e., Partial-REID and Partial-iLIDS) show the superiority of the proposed method compared with several state-of-the-art methods for occluded and partial person Re-ID. We also demonstrate the effectiveness of the proposed method on two general person Re-ID datasets (i.e., Market-1501 and DukeMTMC-reID).

  • Research Article
  • Cite Count Icon 5
  • 10.3390/sym15040906
EcReID: Enhancing Correlations from Skeleton for Occluded Person Re-Identification
  • Apr 13, 2023
  • Symmetry
  • Minling Zhu + 1 more

Person re-identification is a challenging task due to the lack of person image information in occluded scenarios. The current methods for person re-identification only take into account global information, neglect local information, and are not responsive to changes in input. Additionally, these methods do not address the issue of inaccurate joint detection caused by occlusion. In this paper, we propose an occluded person re-identification method based on a graph model and deformable method, which is able to simultaneously focus on global and local information and can flexibly adapt to local information and changes in the input, efficiently resolving issues such as occluded or incorrect joint information. Our method consists of three modules: the mutual help denoising module, inter-node aggregation and update module, and graph matching module. The mutual help denoising module acquires global features and person skeleton node features using a CNN backbone network and a pose estimation model, respectively. It uses symmetric deformable graph attention to obtain the local and global features of the joint points in different views, correcting the information of incorrect nodes and extracting favorable human features. The inter-node aggregation and update module employs deformable graph convolution operations to enhance the relations between the nodes in the same view, resulting in higher-order information. The graph matching module uses graph matching methods based on the human topology to obtain a more accurate similarity calculation for masked images. Experimental results on the Occluded-Duck and Occluded-REID datasets show that our proposed method achieves Rank-1 accuracies of 64.8% and 84.5%, respectively, outperforming current mainstream methods such as HOReID. Our method also achieves good results on the MARKET-1501 and DukeMTMC-ReID datasets. These results demonstrate that our proposed method can extract person features well and effectively improve the accuracy of person re-identification tasks.

  • Research Article
  • Cite Count Icon 26
  • 10.1609/aaai.v38i5.28312
Occluded Person Re-identification via Saliency-Guided Patch Transfer
  • Mar 24, 2024
  • Proceedings of the AAAI Conference on Artificial Intelligence
  • Lei Tan + 5 more

While generic person re-identification has made remarkable improvement in recent years, these methods are designed under the assumption that the entire body of the person is available. This assumption brings about a significant performance degradation when suffering from occlusion caused by various obstacles in real-world applications. To address this issue, data-driven strategies have emerged to enhance the model's robustness to occlusion. Following the random erasing paradigm, these strategies typically employ randomly generated noise to supersede randomly selected image regions to simulate obstacles. However, the random strategy is not sensitive to location and content, meaning they cannot mimic real-world occlusion cases in application scenarios. To overcome this limitation and fully exploit the real scene information in datasets, this paper proposes a more intuitive and effective data-driven strategy named Saliency-Guided Patch Transfer (SPT). Combined with the vision transformer, SPT divides person instances and background obstacles using salient patch selection. By transferring person instances to different background obstacles, SPT can easily generate photo-realistic occluded samples. Furthermore, we propose an occlusion-aware Intersection over Union (OIoU) with mask-rolling to filter the more suitable combination and a class-ignoring strategy to achieve more stable processing. Extensive experimental evaluations conducted on occluded and holistic person re-identification benchmarks demonstrate that SPT provides a significant performance gain among different ViT-based ReID algorithms on occluded ReID.

  • Research Article
  • 10.5194/isprs-archives-xlviii-2-w5-2024-41-2024
Study on Unsupervised Instance Segmentation Models for Person Re-Identification
  • Dec 16, 2024
  • The International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences
  • Margarita N Favorskaya + 1 more

Abstract. Unsupervised instance segmentation for person re-identification is mainly used in challenging cases such as occluded person re-identification and 3D re-identification. Furthermore, unsupervised instance segmentation can be considered as an auxiliary cue, especially useful for long-term person re-identification using multiple cameras and single images. Several instance segmentation models, one-stage and two-stage, were examined in this study. We considered two main families of one-stage instance segmentation models: YOLO-based and SOLO-based and trained the most interesting of them. Several datasets were used for experiments, including the Market1501 dataset, the MSMT17 dataset, the DukeMTMC dataset, the DukeMTMC-reID dataset, the CUHK03 dataset, and the VIPeR dataset. The Mask R-CNN model demonstrated the best accuracy results and the YOLOACT++ model showed the best computational results in terms of instance segmentation. To compare the accuracy results without and with instance segmentation, the BUC model for person re-identification was used as a basis. The experimental results show an increase in Rank-1 accuracy values by an average of 2.7–4.9%.

  • Research Article
  • Cite Count Icon 4
  • 10.1016/j.dsp.2023.104166
Robust feature mining transformer for occluded person re-identification
  • Jul 25, 2023
  • Digital Signal Processing
  • Zhenzhen Yang + 3 more

Robust feature mining transformer for occluded person re-identification

  • Research Article
  • Cite Count Icon 71
  • 10.1109/tcsvt.2021.3088446
Occlusion-Sensitive Person Re-Identification via Attribute-Based Shift Attention
  • Apr 1, 2022
  • IEEE Transactions on Circuits and Systems for Video Technology
  • Hanyang Jin + 2 more

Occluded person re-identification is one of the most challenging tasks in security surveillance. Most existing methods focus on extracting human body features from occluded pedestrian images. This paper prioritizes a difference between occluded and non-occluded person re-ID: When computing the similarity between a holistic pedestrian image and an occluded pedestrian image, a certain part of the human body in this holistic image can be distractive for pedestrian retrieval. To solve this problem, we propose an occluded person re-ID framework named attribute-based shift attention network (ASAN). First, unlike other methods that use off-the-shelf tools to locate pedestrian body parts in the occluded images, we design an attribute-guided occlusion-sensitive pedestrian segmentation (AOPS) module. AOPS is a weakly supervised method that leverages the semantic-level attribute annotations in person re-ID datasets. Second, guided by the pedestrian masks provided by AOPS, a shift feature adaption (SFA) module extracts the visible part of the human body feature in a part-based manner. After that, a visible region matching (VRM) algorithm is proposed to filter out the interfer-ence information in the holistic person images during the retrieval phase and further purify the representation of pedestrian features. Extensive experiments with ablation analysis demonstrate our method’s effectiveness. And the state-of-the-art results are achieved on four occluded datasets Partial-REID, Partial-iLIDS, Occluded-DukeMTMC, and Occluded REID. Moreover, the experiments on two holistic person re-ID datasets Market-1501 and DukeMTMC-reID, and a vehicle re-ID dataset VeRi-776 show that ASAN also has a good generality.

Save Icon
Up Arrow
Open/Close
Notes

Save Important notes in documents

Highlight text to save as a note, or write notes directly

You can also access these Documents in Paperpal, our AI writing tool

Powered by our AI Writing Assistant