Abstract

Crowd counting is very important in many tasks such as video surveillance, traffic monitoring, public security, and urban planning. However, achieving an accurate crowd counting and generating a precise density map are still challenging tasks due to occlusion, perspective distortion, complex backgrounds, and varying scales. In addition, most of the existing methods focus only on the accuracy of crowd counting without considering the correctness of a density distribution, namely, there are many false negatives and false positives in a generated density map. To address this issue, the authors propose a novel encoder-decoder convolution neural network (CNN) that fuses the feature maps in both encoding and decoding sub-networks to generate a more reasonable density map and estimate the number of people more accurately. Furthermore, the authors introduce a new evaluation method named the patch absolute error (PAE) which is appropriate to measure the accuracy of a density map. The extensive experiments on several existing public crowd counting datasets demonstrate that their approach achieves better performance than the current state-of-the-art methods. Finally, considering the cross-scene crowd counting in practice, the authors evaluate their model on some cross-scene datasets. The results show their method has good performance in cross-scene datasets.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call