Abstract

We focus on the person re-identification (re-id) task, whose goal is to automatically re-identify individual persons from multiple non-overlapping cameras or the same camera across time. While most existing works rely on exploring properties of the visual data, we consider taking advantage of both visual and textual representations. Given images and natural language descriptions of the persons in the probe, the re-id system is required to rank all the samples in the gallery set. We embed the visual representations and textual descriptions in a unified space, in which we map similar examples close to each other and map dissimilar examples farther apart. Our premise is that, in general, strong semantic correlations exist between different persons. The space casts a person in gallery set as a combination of the persons in probe set. The model is trained in an end-to-end fashion. We conduct extensive experiments on the challenging i-LIDS, PRID-2011, CUHK03 and Market-1501 datasets, and confirm that the proposed model achieves state-of-the-art performances.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call