Abstract

Abstract The purpose of crowd counting is to estimate the number of pedestrians in crowd images. Crowd counting or density estimation is an extremely challenging task in computer vision, due to large scale variations and dense scene. Current methods solve these issues by compounding multi-scale Convolutional Neural Network with different receptive fields. In this paper, a novel end-to-end architecture based on Multi-Scale Adversarial Convolutional Neural Network (MSA-CNN) is proposed to generate crowd density and estimate the amount of crowd. Firstly, a multi-scale network is used to extract the globally relevant features in the crowd image, and then fractionally-strided convolutional layers are designed for up-sampling the output to recover the loss of crucial details caused by the earlier max pooling layers. An adversarial loss is directly employed to shrink the estimated value into the realistic subspace to reduce the blurring effect of density estimation. Joint training is performed in an end-to-end fashion using a combination of Adversarial loss and Euclidean loss. The two losses are integrated via a joint training scheme to improve density estimation performance.We conduct some extensive experiments on available datasets to show the significant improvements and supremacy of the proposed approach over the available state-of-the-art approaches.

Highlights

  • With the rapid growth in the urban population, public safety issues have become the focus of attention in video surveillance

  • A novel end-to-end architecture based on Multi-Scale Adversarial Convolutional Neural Network (MSA-convolutional neural network (CNN)) is proposed to generate crowd density and estimate the amount of crowd

  • To solve these issues based on the multi-column CNN [19] which has a success of working in the crowd counting, a new crowd counting framework called Multi-Scale Adversarial Convolutional Neural Network (MSA-CNN) is proposed

Read more

Summary

Introduction

With the rapid growth in the urban population, public safety issues have become the focus of attention in video surveillance. If the crowd is very dense, the occlusion between pedestrians is more serious, which may result in poor detection These methods are based on the traditional hand-featured regression, achieving better performance than detection through regressing the number of pedestrians on the image. Inspired by a recent successful solution of multiple computer vision tasks with convolutional neural network (CNN), many CNN-based methods [13,14,15] were developed to solve these issues and obtained remarkable success. Local optimization is achieved by minimizing Euclidean loss, and fine-tuned all sub-network by joint training To solve these issues based on the multi-column CNN [19] which has a success of working in the crowd counting, a new crowd counting framework called Multi-Scale Adversarial Convolutional Neural Network (MSA-CNN) is proposed. Our method was proved superior to the current state-of-the-art performance

Related works
Network architecture
Objective function
Training and Implementation Details
Experiments
Experiment on ShanghaiTech dataset
Comparisons with State-of-the-art
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call