Abstract

Anomaly detection in crowded scenes plays a crucial role in automatic video surveillance to avert any casualty in the areas witnessing the high amount of footfalls. The key challenge for automatically classifying the anomalies in crowd image is the usage of feature set and techniques which can be replicated in every crowded scenario. In this paper, we propose a novel concept of Aggregation of Ensembles (AOE) for detecting an anomaly in video data showing crowded scenes, which leverage the existing capability of pre-trained ConvNets and a pool of classifiers. The proposed approach uses an ensemble of different fine-tuned Convolutional Neural Networks (CNN) based on the hypothesis that different CNN architectures learn different levels of semantic representation from crowd videos and thus an ensemble of CNNs will enable enriched feature sets to be extracted. The proposed AOE concept utilizes the fine-tuned ConvNets as fixed feature extractors to train variants of SVM classifier and then the posterior probabilities are fused to predict the anomaly in the crowd frame sequences. The experimental results show that the proposed Aggregation of Ensembles fine-tuned CNNs of various architectures achieve a higher accuracy in comparison with other established methods on benchmark datasets.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call