Crowd counting, which is widely used in disaster management, traffic monitoring, and other fields of urban security, is a challenging task that is attracting increasing interest from researchers. For better accuracy, most methods have attempted to handle the scale variation explicitly. which results in huge scale changes of the object size. However, earlier methods based on convolutional neural networks (CNN) have focused primarily on improving accuracy while ignoring the complexity of the model. This paper proposes a novel method based on a lightweight CNN-based network for estimating crowd counting and generating density maps under resource constraints. The network is composed of three components: a basic feature extractor (BFE), a stacked à trous convolution module (SACM), and a context fusion module (CFM). The BFE encodes basic feature information with reduced spatial resolution for further refining. Various pieces of contextual information are generated through a short pipeline in SACM. To generate a context fusion density map, CFM distills feature maps from the above components. The whole network is trained in an end-to-end fashion and uses a compression factor to restrict its size. Experiments on three highly-challenging datasets demonstrate that the proposed method delivers attractive performance.
Read full abstract