Abstract
Person re-identification (re-id) refers to matching people across disjoint camera views. Most of person re-id methods extract discriminative features from the whole images or fixed regions and develop their metrics. However, these methods ignore that the attention regions with temporal cues in the pedestrian image pair hold discriminative information. In this paper, we propose the recurrent models of visual co-attention that aim to simulate human eye movement, focusing on the sequential concurrent attention (co-attention) regions of the same locations when comparing image pairs. Since reinforcement learning provides a flexible learning strategy for sequential decision-making, it is naturally applied to perform the temporal re-id co-attention learning task. The reward functions are designed to recursively optimize the prediction by rewarding or punishing the learning process. The recurrent models are used to extract information from a sequence of attention regions. Finally, person re-id is performed based on the whole image feature and the features from the recurrent models. Our contributions are: 1) the visual mechanism, which can dynamically locate the optimal co-attention regions to simulate the human re-id process; 2) the design of reward functions in reinforcement learning, which aims to recursively optimize the prediction process; and 3) experimental results, which demonstrate the advantages of our method compared with the state-of-the-art methods.
Published Version (Free)
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have