Abstract
Due to the variation in the image capturing process, the difference between source and target sets causes a challenge in unsupervised domain adaptation (UDA) on person re-identification (re-ID). Given a labeled source training set and an unlabeled target training set, this paper focuses on improving the generalization ability of the re-ID model on the target testing set. The proposed method enforces two properties at the same time: (1) camera invariance is achieved through the positive learning formed by unlabeled target images and their camera style transfer counterparts; and (2) the robustness of the backbone network feature extraction is improved, and the accuracy of feature extraction is enhanced by adding a position-channel dual attention mechanism. The proposed network model uses a classic dual-stream network. Comparative experimental results on three public benchmarks prove the superiority of the proposed method.
Highlights
Person re-identification is a hot research topic, and has considerable practical value in computer vision
In order to verify the effectiveness of the attention mechanism, DukeMTMC-reID is used as the source domain to evaluate the performance of the attention mechanism on Market-1501
The final accuracy shows that the performance of the proposed model is optimal only when both the position attention mechanism and channel attention mechanism are added at the same time
Summary
Person re-identification (re-ID) is a hot research topic, and has considerable practical value in computer vision. In order to alleviate the above issue, Zhong [17] applied CycleGAN (CamStyle) to generate camera style conversion images to achieve data enhancement for person re-ID. Since the target domain is unlabeled and lacks the corresponding strong constraints, these models cannot well suppress the impacts of changes between different cameras (including viewing angle and background). In this case, proximity search tends to select candidates captured by the same camera as the probe, but these candidates are not correct. According to the number of cameras, the target domain is divided into the corresponding sub-domains to perform image style conversion between domains. The rest of this paper is structured as follows: Section 2 discusses related work; Section 3 presents the proposed method; Section 4 compares the proposed method with state-of-the-art methods and analyzes the related experimental results; and Section 5 concludes this paper
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.