Abstract

Person re-identification with natural language description is a process of retrieving the corresponding person’s image from an image dataset according to a text description of the person. The key challenge in this cross-modal task is to extract visual and text features and construct loss functions to achieve cross-modal matching between text and image. Firstly, we designed a two-branch network framework for person re-identification with natural language description. In this framework we include the following: a Bi-directional Long Short-Term Memory (Bi-LSTM) network is used to extract text features and a truncated attention mechanism is proposed to select the principal component of the text features; a MobileNet is used to extract image features. Secondly, we proposed a Cascade Loss Function (CLF), which includes cross-modal matching loss and single modal classification loss, both with relative entropy function, to fully exploit the identity-level information. The experimental results on the CUHK-PEDES dataset demonstrate that our method achieves better results in Top-5 and Top-10 than other current 10 state-of-the-art algorithms.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call