Abstract

The attention mechanism for person re-identification has been widely studied with deep convolutional neural networks. This mechanism works as a good complement to the global features extracted from an image of the entire human body. However, existing works mainly focus on discovering local parts with simple feature representations, such as global average pooling. Moreover, these works either require extra supervision, such as labeling of body joints, or pay little attention to the guidance of part learning, resulting in scattered activation of learned parts. Furthermore, existing works usually extract local features from different body parts via global average pooling and then concatenate them together as good global features. We find that local features acquired in this way contribute little to the overall performance. In this paper, we argue the significance of local part description and explore the attention mechanism from both local part discovery and local part representation aspects. For local part discovery, we propose a new constrained attention module to make the activated regions concentrated and meaningful without extra supervision. For local part representation, we propose a statistical-positional-relational descriptor to represent local parts from a fine-grained viewpoint. Extensive experiments are conducted to validate the overall performance, the effectiveness of each component, and the generalization ability. We achieve a rank-1 accuracy of 95.1% on Market1501, 64.7% on CUHK03, 87.1% on DukeMTMC-ReID, and 79.9% on MSMT17, outperforming state-of-the-art methods.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call