Abstract

Person re-identification (Re-ID) is challenging due to host of factors: the variety of human positions, difficulties in aligning bounding boxes, and complex backgrounds, among other factors. This paper proposes a new framework called EXAM (EXtreme And Moderate feature embeddings) for Re-ID tasks. This is done using discriminative feature learning, requiring attention-based guidance during training. Here “Extreme” refers to salient human features and “Moderate” refers to common human features. In this framework, these types of embeddings are calculated by global max-pooling and average-pooling operations respectively; and then, jointly supervised by multiple triplet and cross-entropy loss functions. The processes of deducing attention from learned embeddings and discriminative feature learning are incorporated, and benefit from each other in this end-to-end framework. From the comparative experiments and ablation studies, it is shown that the proposed EXAM is effective, and its learned feature representation reaches state-of-the-art performance.

Highlights

  • Person re-identification (Re-ID) has been widely studied to determine whether a person-of-interest has appeared elsewhere, captured by different cameras [1,2,3]

  • Deep neural network is originally developed for image classification [7], and its successful global feature learning strategy for classification was directly adopted for the person Re-ID approaches

  • CUHK03: This dataset contains 14,097 outdoor images of 1467 identities shot by six surveillance cameras at the Chinese University of Hong Kong(CUHK) campus, where 767 identities with 7368 images are in the training set

Read more

Summary

Introduction

Person re-identification (Re-ID) has been widely studied to determine whether a person-of-interest has appeared elsewhere, captured by different cameras [1,2,3]. Deep neural network is originally developed for image classification [7], and its successful global feature learning strategy for classification was directly adopted for the person Re-ID approaches. The learned global representation pays less attention to local details [8], and often suffers weak discriminative ability in identifying targets with similar inter-class common properties or large intra-class differences [9]. The following difficulties are encountered: (1) imprecise pedestrian detection affects global feature learning, e.g., shown in Figure 1a; (2) body posture changes make the learning more difficult, e.g., Figure 1b; (3) unexpected occlusion makes the learned features irrelevant to the human bodies, e.g., Figure 1c; (4) cluttered background or multiple pedestrians with highly similar appearances make the model difficult to distinguish, e.g., Figure 1d,e; (5) Misaligned bounding boxes make the model scale-variant, e.g., Figure 1f

Methods
Results
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.