PLoPS: Localization-aware person search with prototypical normalization

Sanghoon Lee,Youngmin Oh,Donghyeon Baek,Junghyup Lee,Bumsub Ham

doi:10.1016/j.patcog.2024.110479

Abstract

Person search involves localizing and re-identifying persons of interest captured by multiple, non-overlapping cameras. Recent approaches to person search are typically built on object detection frameworks to learn joint person representations for detection and re-identification. To this end, the features extracted from pedestrian proposals are projected onto a unit hypersphere using L2 normalization, and positive proposals that sufficiently overlap with the ground truth are equally incorporated for training by exploiting an external lookup table (LUT). We have found that (1) using the L2 normalization technique, without considering feature distributions, can degenerate the discriminative power of person representations, (2) positive proposals often depict distracting details, such as background clutter and person overlaps, and (3) person features in the LUT are not often updated during training. To address these limitations, we propose a novel framework for person search, dubbed PLoPS, using a prototypical normalization layer, ProtoNorm, that calibrates features while considering the long-tail distribution across person IDs. PLoPS also entails a localization-aware learning scheme that prioritizes better-aligned proposals w.r.t the ground truth. We further introduce a LUT calibration technique to continuously adjust the person features in the LUT. Experimental results and analysis on standard benchmarks demonstrate the effectiveness of PLoPS.

Full Text