Enhancing Cross-modal Completion and Alignment for Unsupervised Incomplete Text-to-Image Person Retrieval

Tiantian Gong,Liyan Zhang,Junsheng Wang

doi:10.24963/ijcai.2024/88

Abstract

Traditional text-image person retrieval methods heavily rely on fully matched and identity-annotated multimodal data, representing an ideal yet limited scenario. The issues of handling incomplete multimodal data and the complexities of labeling multimodal data are common challenges encountered in real-world applications. In response to these challenges encountered, we consider a more robust and pragmatic setting termed unsupervised incomplete text-image person retrieval, where person images and text descriptions are not fully matched and lack the supervision of identity labels. To tackle these two problems, we propose the Enhancing Cross-modal Completion and Alignment (ECCA) method. Specifically, we propose a feature-level cross-modal completion strategy for incomplete data. This approach leverages the available cross-modal high semantic similarity features to construct relational graphs for missing modal data, which can generate more reliable completion features. Additionally, to address the cross-modal matching ambiguity, we propose weighted inter-instance granularity alignment as well as enhanced prototype-wise granularity alignment modules that can map semantically similar image-text pairs more compact in the common embedding space. Extensive experiments on public datasets, fully demonstrate the consistent superiority of our method over SOTA text-image person retrieval methods.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Enhancing Cross-modal Completion and Alignment for Unsupervised Incomplete Text-to-Image Person Retrieval

Abstract

Talk to us

Similar Papers

Lead the way for us

Similar Papers

Cross-Modal Image Retrieval Considering Semantic Relationships With Many-to-Many Correspondence Loss
Huaying Zhang ... Ren Togo
IEEE Access | VOL. 11
Huaying Zhang, et. al.Huaying Zhang ... Ren Togo
01 Jan 2023
IEEE Access | VOL. 11

Effective Sentiment Analysis for Multimodal Review Data on the Web
Peiquan Jin ... Lin Mu
-
Peiquan Jin, et. al.Peiquan Jin ... Lin Mu
01 Jan 2020
01 Jan 2020

HGMF: Heterogeneous Graph-based Fusion for Multimodal Data with Incompleteness
Jiayi Chen ... Aidong Zhang
-
Jiayi Chen, et. al.Jiayi Chen ... Aidong Zhang
20 Aug 2020
20 Aug 2020

A Novel Hybrid Attention-Based Dilated Network for Depression Classification Model from Multimodal Data Using Improved Heuristic Approach
B Manjulatha ... Suresh Pabboju
International Journal of Image and Graphics | VOL. -
B Manjulatha, et. al.B Manjulatha ... Suresh Pabboju
10 Jul 2024
International Journal of Image and Graphics | VOL. -

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Enhancing Cross-modal Completion and Alignment for Unsupervised Incomplete Text-to-Image Person Retrieval

Abstract

Talk to us

Similar Papers