Abstract
This paper presents a novel video processing method for accurate and fast anomaly detection and localization in crowded scenes. We propose a cubic-patch-based method based on a cascade of classifiers, which combines the results of two types of video descriptors. Based on the low likelihood of an anomaly occurrence and the redundancy of structures in normal patches of a video, we introduce two efficient feature sets for describing the spatial and temporal video context, which are named "local" and "global" descriptors. The local description is based on the relation of a patch with its neighbors, and the global description is provided by a sparse auto-encoder. Two reference models, using the local and global descriptions of the training normal patches, are learned as two one-class classifiers. For being fast and accurate, these two classifiers are combined as a cascaded classifier. First, the local classifier, which is faster than the global one, is used for early identification of "many" normal cubic patches. Then, the remaining patches are checked "carefully" by the global classifier. Also, we propose a technique for learning from small patches and inferring from larger patches; this leads to an improved performance. It is shown that the proposed method performs comparable to, or even better than top-performing detection and localization methods on standard benchmarks but with a substantial improvement in speed.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.