Abstract

Due to the variation in the image capturing process, the difference between source and target sets causes a challenge in unsupervised domain adaptation (UDA) on person re-identification (re-ID). Given a labeled source training set and an unlabeled target training set, this paper focuses on improving the generalization ability of the re-ID model on the target testing set. The proposed method enforces two properties at the same time: (1) camera invariance is achieved through the positive learning formed by unlabeled target images and their camera style transfer counterparts; and (2) the robustness of the backbone network feature extraction is improved, and the accuracy of feature extraction is enhanced by adding a position-channel dual attention mechanism. The proposed network model uses a classic dual-stream network. Comparative experimental results on three public benchmarks prove the superiority of the proposed method.

Highlights

  • Person re-identification is a hot research topic, and has considerable practical value in computer vision

  • In order to verify the effectiveness of the attention mechanism, DukeMTMC-reID is used as the source domain to evaluate the performance of the attention mechanism on Market-1501

  • The final accuracy shows that the performance of the proposed model is optimal only when both the position attention mechanism and channel attention mechanism are added at the same time

Read more

Summary

Introduction

Person re-identification (re-ID) is a hot research topic, and has considerable practical value in computer vision. In order to alleviate the above issue, Zhong [17] applied CycleGAN (CamStyle) to generate camera style conversion images to achieve data enhancement for person re-ID. Since the target domain is unlabeled and lacks the corresponding strong constraints, these models cannot well suppress the impacts of changes between different cameras (including viewing angle and background). In this case, proximity search tends to select candidates captured by the same camera as the probe, but these candidates are not correct. According to the number of cameras, the target domain is divided into the corresponding sub-domains to perform image style conversion between domains. The rest of this paper is structured as follows: Section 2 discusses related work; Section 3 presents the proposed method; Section 4 compares the proposed method with state-of-the-art methods and analyzes the related experimental results; and Section 5 concludes this paper

Unsupervised Domain Adaptation
Generative Adversarial Networks
Self-Attention Modules
Overview of the Proposed Framework
Supervised Learning for Source Domain
Intra-Domain Learning
Camera-Aware Neighborhood Invariance
Style Transfer
Dual Attention Network
Position Attention Module
Channel Attention Module
Dataset and Evaluation Metrics
Deep Re-ID Model
Parameter Analysis
Ablation Study
Comparison with State-of-the-Art Methods
Conclusions

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.