Abstract

In recent years, person re-identification based on deep learning approaches has made great progress and achieved good results. However, many of the latest network design methods, which usually deploy ResNet or SENet as the backbone network, were originally designed for classification tasks. Since the person re-identification task is essentially different from the classification task, the structure of the backbone network should be modified accordingly. In this paper, we propose a retrieval network based on a multi-scale backbone architecture, which is specifically suitable for the person re-identification task. By constructing hierarchical residual-like connections within a single residual block, the model learns multi-scale discriminative features of pedestrian images. Unlike many state-of-the-art methods that use complex network structures and concatenate multi-branch features, our proposed retrieval network is implemented using only global features, simple triplet loss, and softmax with cross-entropy loss. The results of extensive experiments show that the proposed network has stronger fine-grained pedestrian representation ability, leading to performance gains for person re-identification tasks. Our proposed network achieves a rank-1 accuracy of 96.03% on the Market-1501 and 92.11% on DukeMTMC-reID datasets while only using global features.

Highlights

  • Person re-identification is usually regarded as a subtask of digital image retrieval

  • AND ANALYSIS we evaluate our proposed method on three large-scale person re-ID benchmark datasets and show the results of the Re-Net model compared with other state-ofthe-art methods

  • In the Re-Net Bottleneck block, we replace a group of 3 × 3 filters with 5 groups of 3 × 3 filters, while connecting different filter groups in a hierarchical residual-like style

Read more

Summary

Introduction

Person re-identification (re-ID) is usually regarded as a subtask of digital image retrieval. Many of the latest re-ID models are designed using ResNet [3] or SENet [4] as backbone networks. These models were originally designed for classification tasks. Xie et al [5] proved that the essentials of image retrieval and image classification are the same because both tasks can be solved by measuring the similarity between images. There are still a few differences between the classification task and the person re-ID task. In the image classification task, the input images of the network are different objects, such as cars and ships, while in the person re-ID task, the inputs of the network are all pedestrian images.

Methods
Results
Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.