Abstract
In this paper, we address the challenging problem of detecting pedestrians who are heavily occluded or far from camera. Unlike most existing pedestrian detection methods which only use coarse-resolution feature maps with fixed receptive field, our approach exploits multi-grained deep features to make the detector more robust to visible parts of occluded pedestrians and small-size targets. Specifically, we jointly train a scale-aware network and a human parsing network in a semi-supervised manner with only bounding box annotation. We carefully design the scale-aware network to predict pedestrians of particular scales using most appropriate feature maps, by matching their receptive field with the target sizes. The human parsing network generates a fine-grained attentional map which helps guide the detector to focus on the visible parts of occluded pedestrians and small-size instances. Both networks are computed in parallel and form an unified single stage pedestrian detector, which assures a great trade-off between accuracy and speed. Experiments on two challenging benchmarks, Caltech and KITTI, demonstrate the effectiveness of our proposed approach, which in addition, executes 2× faster than competitive methods.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.