The Automatic Detection of Pedestrians under the High-Density Conditions by Deep Learning Techniques

Cheng-Jie Jin,Dawei Li,Ting Hui,Ke Ma,Xiaomeng Shi,Alain Lambert

doi:10.1155/2021/1396326

Cheng-Jie Jin, Dawei Li + Show 4 more

Open Access

https://doi.org/10.1155/2021/1396326

Copy DOI

Journal: Journal of Advanced Transportation	Publication Date: Apr 17, 2021
Citations: 6	License type: CC BY 4.0

Affiliation: Southeast University

Abstract

The automatic detection and tracking of pedestrians under high-density conditions is a challenging task for both computer vision fields and pedestrian flow studies. Collecting pedestrian data is a fundamental task for the modeling and practical implementations of crowd management. Although there are many methods for detecting pedestrians, they may not be easily adopted in the high-density situations. Therefore, we utilized one emerging method based on the deep learning algorithm. Based on the top-view video data of some pedestrian flow experiments recorded by an unmanned aerial vehicle (UAV), we produce our own training datasets. We train the detection model by using Yolo v3, a very popular deep learning model among many available detection models in recent years. We find the detection results are good; e.g., the precisions, recalls, and F1 scores could be larger than 0.95 even when the pedestrian density is as high as 9.0 ped / m 2 . We think this approach could be used for the other pedestrian flow experiments or field data which have similar configurations and can also be useful for automatic crowd density estimation.

Highlights

In most pedestrian flow experiments, for the convenience of measuring the positions and velocities, the participants are usually required to wear markers such as caps [6, 9,10,11,12,13,14]
Since the pretrained labels are primarily annotated using real life images from side-view cameras, the performance is not good when directly using for many pedestrian flow experiments, especially when the cameras are perpendicular to the ground. e features of pedestrians appeared in the top-view cameras are quite different from that when shooting from the side view. erefore, we have to train the new model with the samples of various caps found in the video data
We opensource a series of training datasets for pedestrians wearing caps at top-view from unmanned aerial vehicle (UAV) videos. e reproducibility of our methods could be proved by using another dataset for validation

Summary

The Brief Introduction of Yolo v3

Yolo v3 introduces prediction cross scales by using the concept of feature pyramid networks. It predicts boxes at 3 different scales and extracts features from those scales. In the scale 2, the 16 ∗ 16 size feature map is added, and the accuracy of detecting medium objects can be improved. In the scale 3, the 32 ∗ 32 size feature map is used, which makes the detection accuracy of small-scale objects similar to that of medium objects. Due to the above improvements, on the COCO dataset Yolo v3 can achieve the accuracy which is similar to that of RetinaNet, but Yolo v3 is nearly four times faster

The Video Data and Training Process

Results and Evaluations

Some Notes about the Applications

Conclusions