Learning convolutional multi-level transformers for image-based person re-identification

Peilei Yan,Xuehu Liu,Pingping Zhang,Huchuan Lu

doi:10.1007/s44267-023-00025-8

Peilei Yan, Xuehu Liu + Show 2 more

Open Access

PDF Available

https://doi.org/10.1007/s44267-023-00025-8

Copy DOI

Export

Save

Cite

Journal: Visual Intelligence	Publication Date: Oct 13, 2023
Citations: 10	License type: CC BY 4.0

Affiliation: Dalian University of Technology

Abstract
Full-Text PDF
Similar Papers

Abstract

Listen

As a vital vision task, person re-identification (Re-ID) aims to retrieve the same person under non-overlapping cameras. It is a very challenging task due to the presence of complex backgrounds, diverse illuminations and different perspectives. In this work, we integrate the advantages of convolutional neural networks (CNNs) and transformers, and propose a novel learning framework named convolutional multi-level transformer (CMT) for image-based person Re-ID. More specifically, we first propose a scale-aware feature enhancement (SFE) module to extract multi-scale local features from a pre-trained CNN backbone. Then, we introduce a part-aware transformer encoder (PTE) to further mine discriminative local information guided by global semantics. Finally, a deeply-supervised learning (DSL) technique is adopted to optimize the proposed CMT and improve its training efficiency. Extensive experiments on four large-scale Re-ID benchmarks demonstrate that our method performs favorably against several state-of-the-art methods.

Full Text