Multi‐level cross‐modality learning framework for text‐based person re‐identification

Tinghui Wu,Haifeng Hu,Shuhe Zhang,Dihu Chen

doi:10.1049/ell2.12975

Tinghui Wu, Haifeng Hu + Show 2 more

Open Access

PDF Available

https://doi.org/10.1049/ell2.12975

Copy DOI

Export

Save

Cite

Journal: Electronics Letters	Publication Date: Oct 1, 2023
Citations: 2	License type: CC BY-NC-ND 4.0

Affiliation: Sun Yat-sen University

Abstract
Full-Text PDF
Similar Papers

Abstract

Listen

AbstractThe target of text‐based person re‐identification (Re‐ID) is to retrieve the corresponding image of a person through the given text information. However, due to the homogeneous variety and modality heterogeneity, it is challenging to simultaneously learn both global‐level and local‐level cross‐modal features and align them in the same embedding space without additional networks. To address these problems, an effective multi‐level cross‐modality learning (MCL) framework for language and vision person Re‐ID is proposed. More specifically, a multi‐branch feature extraction (MFE) module is designed to comprehensively map both global and partial semantic information for the visual and textual embedding at the same time, capturing the intra‐class semantic relationships in multi‐granularities. Besides, a cross‐modal alignment (CA) module is devised to match the multi‐grained representations and reduce the inter‐class gap from global‐level to partial‐level. Extensive experiments conducted on the CUHK‐PEDES and ICFG‐PEDES datasets suggest that this method outperforms the state‐of‐the‐art models.

Full Text