Abstract

Person re-identification (ReID) has attracted the attention of a large number of researchers due to its wide range of applications. However, due to the difficulty of extracting robust features and the complexity of the feature extraction process, ReID is difficult to truly apply in practice. In this paper, we utilize Pyramid Vision Transformer (PVT) as the backbone for feature extraction and propose a PVT-based ReID method in conjunction with other studies. First, we establish a basic model using powerful methods verified on CNN-based ReID. Second, to further improve the robustness of the features extracted from the PVT backbone, we design two new modules: (1) a local feature clustering (LFC) module is used to select the most discrete local features and cluster them individually by calculating the distance between local and global features, and (2) side information embeddings (SIE) are used to encode nonvisual information and send it to the network for use training in order to reduce its impact on the features. Our experiments show that the proposed PVTReID achieves an mAP of 63.2% on MSMT17 and 80.5% on DukeMTMC-reID. In addition, we evaluated the inference speed for images achieved by different methods, proving that image inference is faster with our proposed method. These results clearly illustrate that using PVT as a backbone network with LFC and SIE modules can improve inference speed while extracting robust features.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.