Abstract

Crowd counting on the image is a challenging problem. Many neural network-based methods usually use two-branch and multi-branch networks to extract high-level features of different scales or densities, and then merge these features by a fusion operation. Although these methods can reduce the error of crowd counting, it makes the amount of parameters is enormous, so that the efficiency of training and optimization of the model is low, and the calculation resource consumption is high. To this end, a residual network based on depthwise separable convolution is proposed for image crowd counting. The network can not only reduce the amount of calculation through depthwise separable convolution, but also deepen the network depth through the residual structure to extract more effective high-level features. The experiment proves that, compared with the start-of-the-art methods, the method in this paper dramatically reduces the parameter amount to 1.91 Million when the accuracy is comparable.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call