Crowded scene understanding algorithm based on Two-Stream Residual Network

Pengcheng Li,Tanfeng Sun,Xinghao Jiang,Ke Xu

doi:10.1109/cisp-bmei.2017.8301919

Abstract

Crowd scene understanding is challenging task with distinctly importance in computer vision. The crowd scene categories are often defined by multi-level information, which leads to a large intra-class variations. Besides, crowd dynamic exhibits various formulations across different crowd systems. A large-scale crowd scene datasets and a quantified generic properties for crowd representation are the key issues of this topic. A framework called Two-Stream Residual Network (TSRN) deep model to federatively learn and aggregate appearance and motion features for crowd understanding is proposed in this paper. Appearance stream is generated from static frame through Residual Network. Motion stream is generated from three scene-independent motion maps: collectiveness, stability, and the conflict as the complement of the appearance streams. Experiments are conducted on a macroscale crowd video dataset named as the Who do What at some Where (WWW), devised to understand crowded scenes. The results show excellent performance in accuracy compared with prior hand-crafted and deep learning methods, attaining a state-of-art accuracy of 88% and 74.9% in two streams respectively, and achieving a 89% accuracy in combined two-stream ResNet.

Full Text