Multi-modal person re-identification based on transformer relational regularization

Xiangtian Zheng,Xiaohua Huang,Chen Ji,Xiaolin Yang,Pengcheng Sha,Liang Cheng

doi:10.1016/j.inffus.2023.102128

Xiangtian Zheng, Xiaohua Huang + Show 4 more

https://doi.org/10.1016/j.inffus.2023.102128

Copy DOI

Export

Save

Cite

Abstract
Full-Text
Similar Papers

Abstract

Listen

For robust multi-modal person re-identification (re-ID) models, it is crucial to effectively utilize the complementary information and constraint relationships among different modalities. However, current multi-modal methods often overlook the correlation between modalities at the feature fusion stage. To address this issue, we propose a novel multimodal person re-ID method called Transformer Relation Regularization (TRR). Firstly, we introduce an adaptive collaborative matching module that facilitates the exchange of useful information by mining feature correspondences between modalities. This module allows for the integration of complementary information, enhancing the re-ID performance. Secondly, we propose an enhanced embedded module that corrects general information using discriminative information within each modality. By leveraging this approach, we improve the model’s stability in challenging multi-modal environments. Lastly, we propose an adaptive triple loss to enhance sample utilization efficiency and mitigate the problem of inconsistent representation among multimodal samples. This loss function optimizes the model’s ability to distinguish between different individuals, leading to improved re-ID accuracy. Experimental results on several challenging visible-infrared person re-ID benchmark datasets demonstrate that our proposed TRR method achieves optimal performance. Additionally, extensive ablation studies validate the effective contribution of each component to the overall model. In summary, our proposed TRR method effectively leverages complementary information, addresses the correlation between modalities, and improves the re-ID performance in multi-modal scenarios. The results obtained from various benchmark datasets and the comprehensive analysis support the efficacy of our approach.

Full Text