Abstract

Most existing text–image person re-identification (TIReID) methods are performed in an ideal environment where both image and text instances are fully intact and identity annotated. However, in real-world open environments, these ideal assumptions often cannot be satisfied. In this study, we are the first to explore how to enhance the robustness of the TIReID model to better adapt to open environments, focusing on two aspects: 1) unlabeled multi-modal data, and 2) incomplete multi-modal data. Here, unlabeled multi-modal data faces two key issues: intra-class variations and cross-modal aligning problems, while incomplete data leads to data semantic inconsistency problems and severe performance degradation. In order to tackle the above issues, we propose a novel Cross-modal Semantic Aligning and Neighbor-aware Completing (CANC) method for robust text–image person re-identification (RTIReID). Specifically, to tackle the intra-class variation problem, we propose intra-view prototype contrastive matching for image and text modalities, which helps the model learn intra-modal discriminative representations while increasing inter-class distance and suppressing intra-class variation. Furthermore, to resolve the cross-modal alignment problem, we propose the cross-view instance projection matching to establish the relationship between different modal representations. Additionally, regarding how to effectively handle incomplete samples, we introduce nearest neighbor consistent completion, which can restore high-quality completion features. We adopt CLIP-ViT and CLIP-Xformer as image and text encoders respectively, and Rank-k metrics (k=1,5,10) as primary evaluation metrics. Extensive experiments on several public datasets demonstrate that our method can effectively address TIReID in open environments, consistently surpassing several SOTA TIReID methods.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.