Abstract

Person reidentification (P-Reid) is an emerging research domain in the field of information retrieval that has gained exponential growth due to its wide range of applications in pedestrian tracking and crime prevention. The primary goal of P-Reid is to recognize a person based on previous appearance in multiview surveillance videos. The mainstream approaches apply fully supervised learning techniques that have poor scalability when deployed in complex real-world scenes, due to the overfitting problem, caused by the lack of sufficient annotated data. Further, optimization of these models for unlabeled data in real-time surveillance is a challenging task. To tackle these issues, an intelligent framework (LR-Net) is proposed, consisting of three tiers including fine-tuning (FT), siamese network (SN), and fusion strategy (FS). In the first tier, a deep learning model is fine-tuned for P-Reid that can handle both labeled and unlabeled data. Next, with the assistance of transfer learning, an SN is proposed that has a strong discriminative capability in terms of similarity between a pair of images. Finally, a learning-to-rank strategy is applied to optimize the learning capability of the SN, in which a triplet network extracts spatial-temporal patterns from unlabeled samples. In addition, a bayesian fusion model (BFM) is introduced to integrate the spatiotemporal and visual features, which yields 4.4%, 9.3%, and 0.8% improvement in the matching score over Market-1501, DukeMCMT-reID, and CUHK03 data sets, respectively. The conducted experiments and ablation study on the benchmark data sets empirically validate the proposed system, which obtains a high Rank-1 score as compared with the state-of-the-art (SOTA) methods.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call