Vision Shared and Representation Isolated Network for Person Search

Yang Liu,Feilong Wang,Chengyu Kong,Shenglan Liu,Yingping Li,Yuqiu Kong

doi:10.24963/ijcai.2022/170

Abstract

Person search is a widely-concerned computer vision task that aims to jointly solve the problems of pedestrian detection and person re-identification in panoramic scenes. However, the pedestrian detection focuses on the consistency of pedestrians, while the person re-identification attempts to extract the discriminative features of pedestrians. The inevitable conflict greatly restricts the researches on the one-stage person search methods. To address this issue, we propose a Vision Shared and Representation Isolated (VSRI) network to decouple the two conflicted subtasks simultaneously, through which two independent representations are constructed for the two subtasks. To enhance the discrimination of the re-ID representation, a Multi-Level Feature Fusion (MLFF) module is proposed. The MLFF adopts the Spatial Pyramid Feature Fusion (SPFF) module to obtain diverse features from the stem network. Moreover, the multi-head self-attention mechanism is employed to construct a Multi-head Attention Driven Extraction (MADE) module and the cascaded convolution unit is adopted to devise a Feature Decomposition and Cascaded Integration (FDCI) module, which facilitates the MLFF to obtain more discriminative representations of the pedestrians. The proposed method outperforms the state-of-the-art methods on the mainstream datasets.

Full Text