Crowd counting has become popular due to its applications in congested scenes. Current methods excel with specialized datasets but often ignore the density distribution affected by perspective. To investigate the true distribution of crowd density in the physical world, this paper introduces a framework leveraging head size variations to estimate real crowd distribution and count crowds. First, a convolutional neural network is designed to generate predicted density maps and corresponding bounding boxes across different datasets. Second, a filter kernel is employed to identify the most crowded area in the input image based on the obtained boxes. Finally, the vanishing point is computed by calculating the intersection of the two generated lines, which is then used to obtain the inverse perspective of the predicted density map. The Grid Mean Relative Error (GMRE) metric is proposed for evaluating transformation accuracy. In comparison with the Grid Average Mean Absolute Error (GAME), GMRE is distance-aware and more suitable for evaluating differences between distinct coordinate systems. Additionally, extensive experiments are conducted to validate the counting capabilities of the proposed network. Experimental results show the network’s competitive counting ability and reduced transformation error.
Read full abstract