Crowd counting method based on the self-attention residual network

Yan-Bo Liu,Rui-Sheng Jia,Xing-Li Zhang,Qing-Ming Liu,Hong-Mei Sun

doi:10.1007/s10489-020-01842-w

Abstract

Estimating the crowd density in surveillance videos is a hot issue in the field of computer vision and has become the basis of data processing and analysis of public transport services, commercial passenger flow analysis, public security protection and other industries. However, in terms of practical applications, due to the problems of pedestrian occlusion and scale changes, existing methods are inadequate with regard to the acquisition of the human head, which affects the accuracy of counting. To solve this problem, a crowd counting method based on a self-attention residual network is proposed. First, a multiscale convolution module composed of dilated convolution and deformation convolution is used. To avoid losing image resolution, some of the sampling positions are shifted to the occluded crowd by shifting the sampling points, which solves the problem of crowd occlusion. Then, a self-attention residual module is designed to score and classify the feature map, which allows all pixels in the feature map to be classified. The corresponding weight is generated, and the population scale is determined by the weight, which solves the problem of crowd scale changes. The algorithm is applied in ShanghaiTech and the UCF_CC_50 and WorldExpo’10 datasets are tested. The experimental results show that the mean absolute error (MAE) and mean square error (MSE) of this algorithm are significantly reduced compared with those of a comparative algorithm.

Full Text