Abstract

The existing pedestrian detection algorithms cannot effectively extract features of heavily occluded targets which results in lower detection accuracy. To solve the heavy occlusion in crowds, we propose a multi-scale feature pyramid network based on ResNet (MFPN) to enhance the features of occluded targets and improve the detection accuracy. MFPN includes two modules, namely double feature pyramid network (FPN) integrated with ResNet (DFR) and repulsion loss of minimum (RLM). We propose the double FPN which improves the architecture to further enhance the semantic information and contours of occluded pedestrians, and provide a new way for feature extraction of occluded targets. The features extracted by our network can be more separated and clearer, especially those heavily occluded pedestrians. Repulsion loss is introduced to improve the loss function which can keep predicted boxes away from the ground truths of the unrelated targets. Experiments carried out on the public CrowdHuman dataset, we obtain 90.96% AP which yields the best performance, 5.16% AP gains compared to the FPN-ResNet50 baseline. Compared with the state-of-the-art works, the performance of the pedestrian detection system has been boosted with our method.

Highlights

  • The main contributions of this paper are two-fold: (1) We propose a novel feature extraction network called double feature pyramid network (FPN) integrated with ResNet (DFR) to enhance the semantic information and contours of occluded pedestrians, and to simplify the network structure and the parameters

  • (2) We introduce the concept of repulsion loss of minimum (RLM) to keep predicted boxes away from the ground truths of the other pedestrians, which can monitor the learning process of predicted boxes

  • We propose the regression loss termed Repulsion Loss of Minimum (RLM), an improved method of loss function compared to the original in [41], which only uses the first part of our loss function

Read more

Summary

Introduction

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. The CrowdHuman dataset [3] has an average of 23 pedestrians in each image, with various levels of occlusion It is difficult for general object detectors to solve this kind of problem. We propose a novel scheme called MFPN for heavily occluded pedestrian detection. The main contributions of this paper are two-fold: (1) We propose a novel feature extraction network called double FPN integrated with ResNet (DFR) to enhance the semantic information and contours of occluded pedestrians, and to simplify the network structure and the parameters. The remainder of this paper is organized as follows: In Section 2, the methods of pedestrian detection in crowded scenes are introduced.

Related Works
Materials and Methods
Architecture Network
DFR Network
Repulsion Loss of Minimum
Experiments
Datasets
Detailed Settings
Comparison of Feature Maps
Ablation Study
Comparison of Previous Works
Findings
Conclusions
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call