Abstract

Person search, one of the increasingly important tasks of computer vision, performs pedestrian localization and re-identification simultaneously. Several studies incorporate anchor-free detectors into person search models and achieve good results. However, these methods ignore learning the contextual information of images, which is essential for improving the performance. To solve this problem, we propose a global-aware and local-aware enhancement network for person search. The stem network is constructed using hierarchical vision transformers to adaptively learn the global context. A local-aware enhancement module is proposed to learn the local context and incorporate multi-level features. To facilitate adequate learning on the limited training data, we design a symmetric online instance matching (SOIM) loss for end-to-end training the whole model. We conduct experiments on two benchmark datasets (the CUHK-SYSU and the PRW dataset) in comparison with the state-of-the-art methods. The results demonstrate the comparable performance of our proposed method.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call