Abstract
Video Anomaly Detection (VAD) is an essential yet challenging task in the signal processing community, which aims to understand the spatial and temporal contextual interactions between objects and surrounding scenes to detect unexpected events in surveillance videos. However, existing unsupervised methods either use a single network to learn global prototype patterns without making a unique distinction between foreground objects and background scenes or try to strip objects from frames, ignoring that the essence of anomalies lies in unusual object-scene interactions. To this end, this letter proposes an Object-centric Scene Inference Network (OSIN) that uses a well-designed three-stream structure to learn both global scene normality and local object-specific normal patterns as well as explore the object-scene interactions using scene memory networks. Experimental results on three benchmark datasets demonstrate the effectiveness of the proposed OSIN model, which achieves frame-level AUCs of 91.7%, 79.6%, and 98.3% on the CUHK Avenue, ShanghaiTech, and UCSD Ped2 datasets, respectively.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.