Abstract

With the growth of crowd phenomena in the real world, crowd scene understanding is becoming an important task in anomaly detection and public security. Visual ambiguities and occlusions, high density, low mobility and scene semantics, however, make this problem a great challenge. In this paper, we propose an end-to-end architecture, Convolutional DLSTM (ConvDLSTM), for crowd scene understanding. ConvDLSTM consists of GoogleNet Inception V3 convolutional neural networks (CNN) and stacked differential long short-term memory (DLSTM) networks. Different from traditional non-end-to-end solutions which separate the steps of feature extraction and parameter learning, ConvDLSTM utilizes a unified model to optimize the parameters of CNN and RNN hand in hand. It thus has the potential of generating a more harmonious model. The proposed architecture takes sequential raw image data as input, and does not rely on tracklet or trajectory detection. It thus has clear advantages over the traditional flow-based and trajectory-based methods, especially in challenging crowd scenarios of high density and low mobility. Taking advantage of the semantic representation of CNN and the memory states of LSTM, ConvDLSTM can effectively analyze both the crowd scene and motion information. Existing LSTM-based crowd scene solutions explore temporal information and are claimed to be deep in time. ConvDLSTM, however, models the spatial and temporal information in a unified architecture and achieves deep in space and time. Extensive performance studies on the Violent-Flows and CUHK Crowd datasets show that the proposed technique significantly outperforms state-of-the-art methods.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.