Abstract

Pedestrian detection has made breakthroughs after the rise of convolutional neural networks. However, it faces some challenging problems, including dataset difference, small pedestrian targets and occlusions between pedestrians. To deal with these problems, we propose a novel convolutional network architecture, named multi-scale cross-layer fusion and center position network (MCF-CP-NET). A new backbone unit is designed to introduce channel-wise attention into the improved aggregated residual transformations for effective feature extraction. We select suitable anchor setting for pedestrian detection datasets to tackle the problem of dataset difference. A feature pyramid sub-network with cross-layer fusion is developed for better detection of small pedestrian targets, where cross-layer connections are used to reduce the information loss and low-level marginal feature dissipation and better fuse low- and high-level features. We add a center position branch into the localization regression sub-network in MCF-CP-NET to better detect occluded pedestrians, which predicts the centrality index of the localization box to obtain the center score, and further optimizes the score of non-maximum suppression. Experiments show that the average precision and recall of MCF-CP-NET are separately improved by 1.2% and 0.7% on the person class of Pascal VOC2007 dataset and 1.6% and 0.1% on WiderPerson dataset, in comparison with the state-of-the-art.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call