CASA-Crowd: A Context-Aware Scale Aggregation CNN-Based Crowd Counting Technique

Naveed Ilyas,Kiseon Kim,Ashfaq Ahmad

doi:10.1109/access.2019.2960292

Abstract

The accuracy of object-based computer vision techniques declines due to major challenges originating from large scale variation, varying shape, perspective variation, and lack of side information. To handle these challenges most of the crowd counting methods use multi-columns (restrict themselves to a set of specific density scenes), deploying a deeper and multi-networks for density estimation. However, these techniques suffer a lot of drawbacks such as extraction of identical features from multi-column, computationally complex architecture, overestimate the density estimation in sparse areas, underestimating in dense areas and averaging of feature maps result in reduced quality of density map. To overcome these drawbacks and to provide a state-of-the-art counting accuracy with comparable computational cost, we therefore propose a deeper and wider network: a Context-aware Scale Aggregation CNN-based Crowd Counting method (CASA-Crowd) to obtain the deep, varying scale and perspective varying features. Further, we include a dilated convolution with varying filter size to obtain contextual information. In addition, due to different dilation rates, a variation in receptive field size is more useful to overcome the perspective distortion. The quality of density map is enhanced while preserving the spatial dimension by obtaining a comparable computational complexity. We further evaluate our method on three well-known datasets: UCF_CC_50, ShanghaiTech Part_A, ShanghaiTech Part_B.

Highlights

Automated crowd counting refers to estimating the number of individuals in unconstrained scenes depicted by images and videos
We propose a deeper, wider and more robust approach named Context-aware Scale Aggregation convolutional neural networks (CNN)-based Crowd Counting Method (CASA-Crowd)
RELATED WORK With the rapid growth of CNN-based techniques in classification, recognition, and especially segmentation tasks, the CNN-based methods are employed for the purpose of density estimation and crowd counting

Summary

INTRODUCTION

Automated crowd counting refers to estimating the number of individuals in unconstrained scenes depicted by images and videos. Before CNN-based crowd counting model, different types of complex networks have been appended to increase the accuracy. Density-aware network that contains multiple sub-networks pre-trained on scenarios with different densities [11] Based on these observations, we propose a deeper, wider and more robust approach named Context-aware Scale Aggregation CNN-based Crowd Counting Method (CASA-Crowd). Extensive experiments are conducted on three challenging datasets depicted that our model achieves the state of the art performance

RELATED WORK

GROUND TRUTH DENSITY ESTIMATION

EXPERIMENTS

Findings

CONCLUSION AND FUTURE WORK