Abstract

With a focus on practical applications in the real world, a number of challenges impede the progress of pedestrian detection. Scale variance, cluttered backgrounds and ambiguous pedestrian features are the main culprits of detection failures. According to existing studies, consistent feature fusion, semantic context mining and inherent pedestrian attributes seem to be feasible solutions. In this paper, to tackle the prevalent problems of pedestrian detection, we propose an anchor-free pedestrian detector, named context and attribute perception (CAPNet). In particular, we first generate features with consistent well-defined semantics and local details by introducing a feature extraction module with a multi-stage and parallel-stream structure. Then, a global feature mining and aggregation (GFMA) network is proposed to implicitly reconfigure, reassign and aggregate features so as to suppress irrelevant features in the background. At last, in order to bring more heuristic rules to the network, we improve the detection head with an attribute-guided multiple receptive field (AMRF) module, leveraging the pedestrian shape as an attribute to guide learning. Experimental results demonstrate that introducing the context and attribute perception greatly facilitates detection. As a result, CAPNet achieves new state-of-the-art performance on Caltech and CityPersons datasets.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call