Bottom-Up Foreground-Aware Feature Fusion for Person Search

Wenjie Yang,Dangwei Li,Xiaotang Chen,Kaiqi Huang

doi:10.1145/3394171.3413991

Abstract

The key to efficient person search is jointly localizing pedestrians and learning discriminative representation for person re-identification (re-ID). Some recently developed task-joint models are built with separate detection and re-ID branches on top of shared region feature extraction networks, where the large receptive field of neurons leads to background information redundancy for the following re-ID task. Our diagnostic analysis indicates the task-joint model suffers from considerable performance drop when the background is replaced or removed. In this work, we propose a subnet to fuse the bounding box features that pooled from multiple ConvNet stages in a bottom-up manner, termed bottom-up fusion (BUF) network. With a few parameters introduced, BUF leverages the multi-level features with different sizes of receptive fields to mitigate the background-bias problem. Moreover, the newly introduced segmentation head generates a foreground probability map as guidance for the network to focus on the foreground regions. The resulting foreground attention module (FAM) enhances the foreground features. Extensive experiments on PRW and CUHK-SYSU validate the effectiveness of the proposals. Our Bottom-Up Foreground-Aware Feature Fusion (BUFF) network achieves considerable gains over the state-of-the- arts on PRW and competitive performance on CUHK-SYSU.

Full Text