Crowd scene analysis receives growing attention due to its wide applications. Grasping the accurate crowd location is important for identifying high-risk regions. In this article, we propose a Compressed Sensing based Output Encoding (CSOE) scheme, which casts detecting pixel coordinates of small objects into a task of signal regression in encoding signal space. To prevent gradient vanishing, we derive our own sparse reconstruction backpropagation rule that is adaptive to distinct implementations of sparse reconstruction and makes the whole model end-to-end trainable. With the support of CSOE and the backpropagation rule, the proposed method shows more robustness to deep model training error, which is especially harmful to crowd counting and localization. The proposed method achieves state-of-the-art performance across four mainstream datasets, especially achieves excellent results in highly crowded scenes. A series of analysis and experiments support our claim that regression in CSOE space is better than traditionally detecting coordinates of small objects in pixel space for highly crowded scenes.
Read full abstract