Abstract

We show that crowd counting can be formulated as a sequential decision-making (SDM) problem. Inspired by human counting, we evade one-step estimation mostly executed in existing counting models and decompose counting into sequential sub-decision problems. During implementation, a key insight is to interpret sequential counting as a physical process in reality-scale weighing. This analogy allows us to implement a novel "counting scale" termed LibraNet. Our idea is that, by placing a crowd image on the scale, LibraNet (agent) learns to place appropriate weights to match the count: at each step, one weight (action) is chosen from the weight box (the predefined action pool) conditioned on the image features and the placed weights (state) until the pointer (the agent output) informs balance. We investigate two forms of state definition and explore four types of LibraNet implementations under different learning paradigms, including deep Q-network (DQN), actor-critic (AC), imitation learning (IL), and mixed AC+IL. Experiments show that LibraNet indeed mimics scale weighing, that it outperforms or performs comparably against state-of-the-art approaches on five crowd counting benchmarks, that it can be used as a plug-in to improve off-the-shelf counting models, and particularly that it demonstrates remarkable cross-dataset generalization. Code and models are available at https://git.io/libranet.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call