Multi-Scale Pooling In Deep Neural Networks For Dense Crowd Estimation

Ali Raza Radhan,Ghulam Hussain,Fareed Ahmed Jokhio,Arsalan Ahmed,Kamran Javed

doi:10.30537/sjet.v5i1.1023

Abstract

State-of-the-art-methods for counting persons in dense crowded places lack in estimating accurate crowd density due to following reasons. They typically apply the same filters over a complete image or over big image patches. Only then the perspective distortion can be compensated by estimating local scale. It is achieved by training an additional classifier with the optimal kernel size chosen from limited choices. These methods are restricted to the context they are applied on because they are not end-to-end trainable; cannot justify quick scale changes because they allocate a single scale to big image patches; and can only utilize a narrow range of receptive fields for the networks to be of a feasible size. In this study, we bring in an end-to-end trainable deep architecture that merges features achieved from multiple kernels of different sizes and learns various essential features such as quick scale changes and to utilize the right context at each image location. This technique flexibly encodes scale of related information to precisely predict crowd density. The training and validation loss of the proposed approach is 5% and 4% lower than the state-of-the-art context aware method, respectively.

Full Text