Abstract

Video anomaly detection presents a significant challenge in computer vision, with the aim of distinguishing various anomaly events from numerous normal ones. Weakly supervised video anomaly detection has recently emerged as a promising solution, enabling the detection of anomaly snippets with only video-level annotations. However, knowledge about anomaly annotation remains underutilized, resulting in a gap between visual space and semantic understanding of anomalies, thus failing to capture the clear boundary between anomalies and normalities. Therefore, we propose a weakly supervised paradigm of cross-modal detection and consistency learning, leveraging dual consistency to provide discriminative representations for anomalies at both the semantic-to-target and target-to-snippet levels. Specifically, we introduce a cross-modal detection network, which detects the targets in each frame according to given semantic rules, to derive semantic-consistent visual embeddings. To depict the clear boundary between anomalies and normalities, a cross-domain alignment module is proposed to enhance the discriminative representation of abnormal targets by learning the contextual consistency between the target and snippet embeddings. Our architecture integrates the detection of semantic-consistent targets based on variable semantic rules, ensuring transferable deployment across scenarios and enabling comprehensive identification, localization, and recognition of abnormal events through a “when-where-which” pipeline. The evaluation of our approach is conducted on four widely used public benchmarks: ShanghaiTech, UCSD Ped2, CUHK Avenue, and UBnormal through extensive qualitative and quantitative analyzes. The results demonstrate the remarkable performance of our approach in dealing with the VAD task.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.