Abstract

Estimating the crowd density in surveillance videos is a hot issue in the field of computer vision and has become the basis of data processing and analysis of public transport services, commercial passenger flow analysis, public security protection and other industries. However, in terms of practical applications, due to the problems of pedestrian occlusion and scale changes, existing methods are inadequate with regard to the acquisition of the human head, which affects the accuracy of counting. To solve this problem, a crowd counting method based on a self-attention residual network is proposed. First, a multiscale convolution module composed of dilated convolution and deformation convolution is used. To avoid losing image resolution, some of the sampling positions are shifted to the occluded crowd by shifting the sampling points, which solves the problem of crowd occlusion. Then, a self-attention residual module is designed to score and classify the feature map, which allows all pixels in the feature map to be classified. The corresponding weight is generated, and the population scale is determined by the weight, which solves the problem of crowd scale changes. The algorithm is applied in ShanghaiTech and the UCF_CC_50 and WorldExpo’10 datasets are tested. The experimental results show that the mean absolute error (MAE) and mean square error (MSE) of this algorithm are significantly reduced compared with those of a comparative algorithm.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call