Pedestrian detection in crowded scenes is an intractable problem in computer vision, in which occlusion often presents a great challenge. In this paper, we propose a novel context-aware feature learning method for detecting pedestrians in crowds, with the purpose of making better use of context information for dealing with occlusion. Unlike most current pedestrian detectors that only extract context information from a single and fixed region, a new pixel-level context embedding module is developed to integrate multi-cue context into a deep CNN feature hierarchy, which enables access to the context of various regions by multi-branch convolution layers with different receptive fields. In addition, to utilize the distinctive visual characteristics formed by pedestrians that appear in groups and occlude each other, we propose a novel instance-level context prediction module which is actually implemented by a two-person detector, to improve the one-person detection performance. Applying with these strategies, we achieve an efficient and lightweight detector that can be trained in an end-to-end fashion. We evaluate the proposed approach on two popular pedestrian detection datasets, i.e., Caltech and CityPersons. The extensive experimental results demonstrate the effectiveness of the proposed method, especially under heavy occlusion cases.
Read full abstract