Abstract

In order to accurately identify objects of different sizes, we propose an efficient Multi-Scale and Multi-Column Convolutional Neural Network (MSMC) to estimate the crowd density. On the one hand, the ground truth is generated based on the existed label information. On the other hand, the image is fed into our model to find the relationship between the ground truth and the predicted density map. The network is composed of three components: feature extraction, feature fusion and feature regression. First, VGG16 is utilized for faster feature extraction. Second, different sizes layers from VGG16 are fused, which helps the detection of objects with different sizes. Third, we apply multi-channel convolution to further solve the issue of multi-sizes. After the fusion block, the dilated convolution is employed to strengthen the receptive field without increasing the amount of parameters. In the crowd density estimation, the combination of multiple sizes and multiple channels enhances the ability of receiving information, improves the mapping ability of the original image and the density map, and promotes the accuracy of crowd density estimation. In this paper, the test results of the ShanghaiTech Dataset and UCF_CC_50 Dataset are provided in the Experiment section, which shows that the proposed method makes an excellent performance in both accuracy and robustness.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.