Abstract

AbstractPedestrian counting from unconstrained images is an important task in various applications such as resource management, transportation engineering, urban design, and advertising, but it is greatly challenged by some factors such as interocclusion, cross‐scene, scale, and scene perspective distortion. Traditional image‐based methods suffer from them, and the performance of conventional sensor‐based methods such as Kinect and LASER degrades gradually with the increase in pedestrian count and distance from the device to pedestrians. Based on these challenges, this paper proposes a new network model making use of stacked multicolumn convolutional neural networks (CNNs) for pedestrian counting. The human's head features are used to replace the whole body for solving the problem of serious occlusion and choose multicolumn CNNs for dealing with scale and scene perspective distortion. Also, pretrained VGG‐16 is used to generate deeper detailed features and expand the receptive field of the model. Extensive analysis and experiments on current major pedestrian counting datasets show that the proposed network model has considerable advantages in pedestrian counting tasks compared to other state‐of‐the‐art models, and the proposed network model has an improvement effect for the training process. Moreover, the visual differences between the generated density map and ground‐truth density map are visualized and analyzed quantitatively to demonstrate the feasibility of the model.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call