Abstract

Extracting fine-grained features from person images has proven crucial in person re-identification (re-ID). Although the research of convolutional neural networks (CNN) has been very successful in person re-ID, due to the small receptive field and downsampling operation, the existing CNNs cannot solve the problem of information loss. The multi-head attention modules in transformer can solve the above problems well. However, since dicing operations destroy the spatial correlation between patches, transformer still loses some local features. in this paper, we propose the scheme of the patch information supplement transformer (PIT) to extract fine-grained features in the dicing stage. Patch pyramid network (PPN) is introduced to solve the problem of local information loss. This is accomplished by dividing the image into different scales through the dicing operation and adding them together from top to bottom according to the pyramid structure. In addition, we insert a learnable identity information-embedding module (IDE) to reduce the feature bias of clothing and camera perspective. Experiments verify the superiority and effectiveness of PIT compared to state-of-the-art methods.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.