Abstract

In this paper, we propose a joint learning of spatio-temporal representation based on 3D deep convolutional neural network for simultaneous representation of appearance and motion information in 3D volumes which are extracted from the multiple consecutive frames, and an end-to-end learning framework to detect abnormal events in surveillance scenes. By using the joint learning approach, the proposed framework can detect various abnormal events which can appear with diverse motion and appearance patterns. The proposed framework detects abnormal events in each volume by analyzing the spatio-temporal representation trained by the joint learning method. This volume-level event detection approach makes it possible to localize an abnormal event. We verify the proposed joint learning and the framework on the publicly available abnormal event datasets containing UMN dataset, UCSD dataset, and subway dataset, by comparing it with existing state-of-the-art methods. The experimental results demonstrate that the proposed joint learning and event detection method not only detect various abnormal events more efficiently but also localize anomalous regions more accurately.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call