Abstract

Estimating the number of people present in an image has many practical applications including visual surveillance and public resource management. Recently, regression-based methods for people counting have gained considerable importance, principally due to the capability of these methods to handle crowded scenes. However, the principal drawback of regression-based methods is to find an optimal set of features and a model, which is usually dependent on the crowd density. Encouraged by the recent success of sparse representation, here, we develop a robust and scalable people counting method. Sparse representation allows us to capture the hidden structure and semantic information in visual data and leads to faster processing algorithms. In order to reduce the complexity of solving l1-minimization problem, which resides at the heart of the sparse representation, a dimensionality reduction method based on random projection is employed. The sparse representation framework provides new insight that if sparsity in the classification problem is properly harnessed, feature extraction is no longer critical. So, in addition to several hand-crafted features, we exploit the features obtained from pre-trained deep Convolutional neural network and show these features perform competitively. Further, to render the proposed method user friendly, we employ a semi-supervised elastic net to automatically annotate unlabelled data with only a handful of user-labelled image frames. Our semi-supervised method exploits temporal continuity in videos. We use extensive evaluations on the crowd analysis benchmark datasets to demonstrate the effectiveness of our approach as well as its superiority over the state-of-the-art regression-based people counting methods, in terms of accuracy and time.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call